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ABSTRACT 



In recent years, there has been considerable interest in the 
precise assessment of instructional outcomes. The inadequacy of 
norm-referenced devices has been recognized. In addition, there 
has been a movement toward gearing educational tests to the 
specific educational outcomes that instructional programs are in- 
tended to reflect. These tests are often referred to as 
criterion-referenced, domain-referenced, or mastery tests. 

A mastery test is typically designed to reflect specific 
educational objectives and is normally used to make decisions 
regarding student achievement. Such tests also form an integral 
part of any program evaluation, where the focus is on the number 
of students judged as competent in a given domain of performance. 
Other situations in which institutional decisions about individuals 
aro required include: testing for certification in a profession; 
testing for minimum competency, such as for high school graduation; 
and the assessment of basic skills. 

This study provides a basic technical framework for the 
design and use of mastery tests. The topics discussed are (a) 
appropriate ways to select test items, (b) practical methods for 
extracting the best information from test data, (c) efficient 
procedures for using data to make decisions, and (d) means for 
relating test scores to the instructional outcomes being evaluated. 
Statistical procedures and computer programs have been developed 
to help testing practitioners deal with these issues in a simple 
and convenient way. 

The solutions reported in this study are directed toward the 
improvement of educational testing in the context of instruction. 
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An Overview of the 
Mastery Testing Project 



AN OVERVIEW OF THE MASTERY TESTING PROJECT 



Huynh Huynh 
Joseph C. Saundeis 



I . BACKGROUND 

Recent developments and interest in adaptive instruction and 
mastery learning call for new testing procedures focusing on the 
evaluation of individual performance in terms of some competency 
criterion. Given that a domain of. behaviors is uniquely defined by 
the mastery of some unit of instruction, a test is deliberately 
constructed to produce scores that reflect the degree of competency 
in those behaviors- At the end of the period of instruction, the 
test is administered to the individual student, and on the basis of 
the observed test score he or she *s classified in ore of several 
achievement categories- In typical instructional situations there 
are two such categories, usually labeled mastery and nonmastery- 

Using test scores to make decisions about individual students 
is a daily activity in any effort to evaluate instructional programs. 
When the objectives are clearly specified, an obvious concern of 
the evaluator is the number of students or trainees who have mas- 
tered any or all the objectives as a result of participating in the 
program. The classification of students actually serves a dual 
purpose: first, it pinpoints the objectives that a disproportionate 
number of students have failed to master, thus encouraging a closer 



The Mastery Testing Project was supported by Grant NIE-G-78-0087 
with the National Institute of Education, Department of Education, 
Huynh Huynh, Principal Investigator. Points of view or opinions 
stated do not necessarily reflect NIE position or policy and nc 
official endorsement should be inferred. Requests for reprints of 
the papers described in the Publication Series in Mastery Testing 
should be addressed to Huynh Huynh, College of Education, University 
of South Carolina, Columbia, South Carolina, 29208. 
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look at the instructional strategies for those objectives; second, 
it identifies individual students who have not mastered some of the 
objectives and for whru special provisions need to be made to 
facilitate their attainment of these objectives. 

Thus, using test scores to make decisions is an integral part 
of the educational enterprise. In various stages of educational 
testing development, this effort has been known as criterion- 
referenced, lomain-referenced . or mastery testing . Though these 
terms have different interpretations, it seems important to note 
that they often refer to different aspects of the same process. 
Consider, for example, the case in which test items are deliber- 
ately constructed (or selected from an item bank) to reflect 
specific educational objectives; the resulting test scores are 
referenced to these objectives for interpretation and are then used 
to assess the competency or mastery of the individual student with 
respect to each of the objectives. 

Criterion-Referenced and Doma in -Referenced Testing 

Though the term criterion-referenced is used by most testing 
practitioners (e.g., those working at school districts), the term 
domain-referenced has been used in the report to make it clear that 
test items are referenced directly to specific educational objec- 
tives. The term mastery , on the other hand, is used to draw atten- 
tion to the fact that test scores arc used to make certain decisions 
regarding the individual student. It may also be noted that it 
would be difficult to make meaningful decisions on the basis of 
test scores unless the test items can be directly referenced to a 
well -defined domain of performance. (This domain may be defined by 
a single objective or by several objectives; in these cases the 
test is typically labeled objectivg -ref erenced . ) When a student is 
judged to be a master on the basis of a high test score, what in 
fact has be-n mastered? In order to answer this question, the 
objectives or domain of performances on which the student is to be 
judged must be specified in advance. If this line of reasoning is 
correct, then the process of mastery testing embodies the concept 
of domain-referenced testing. 
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Minimum Compe^ ncy Testing and Basic Skills Assessment 

The procedures associated with mastery tesLing resemble those 
used in minimum competency testing or in basic skills assessment. 
In attempting to reverse the decline in the level of student 
achievement over the last decade, several states have implemented 
statewide programs testing for minimum competency in the basic 
skills. Many of these programs aim to insure that high school 
graduates possess a minimum level of academic achievement or have 
acquired the skills required to function effectively as adults in 
American society. Minimum competency testing, in this sense, acts 
as a high school exit examination or what has been called a certi- 
fication examination. When used in this manner, minimum competency 
examinations do not have the positive connotation of some other 
basic skills assessment programs. The latter programs are specifi- 
cally designed for a continuous monitoring of the acquisition of 
basic skills (namely, reading, writing, and mathematics) across 
succeeding grade 1 evels. The results of these continuous monitor- 
ing programs are used to diagnose a student's deficiencies in the 
basic skills and to provide for instructional remediation. 

Although sometimes differing in their ultimate purposes, mas- 
tery testing, minimum competency testing, and the monitoring of 
basic skills are similar in many aspects of test development and 
other technical problems. The selection or construction of test 
items relies heavily on a thoughtful specification of the educa- 
tional objectives or domain of skills to which scores are to be 
referenced via performance on the test items. The specifications 
for the items themselves must, in most instances, be worked out in 
considerable detail so that there will be a high degree of con- 
gruence between the test items and the corresponding educational 
objectives. Technical aspects held in common include issues such 
as setting passing scores (or performance standards) , assessing 
decision reliability, assessing errors of classification, determin- 
ing test length, selecting items to maximize the accuracy of 
classifications, referencing test items to segments of the 
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curriculum or currently adopted textbooks, constructing alternate 
forms, and studying bias in decisions based on "eK scores. 

II. TECHNICAL PROBLEMS IN MASTERY TESTING 



For a period of two years (September 1, 1978, through August 31, 
1980), the National Institute of Education provided financial sup- 
port for the work of the principal investigator concerning some of 
the above-mentioned technical issues in mastery testing. This 
research has dealt with the following questions. 

(1) What are some of the optimum ways to approach the issue of 
3etting test passing scores in both large testing programs and in 

a typical classroom situation? How should passing score judgments 
based on the content of the test items be processed? 

(2) In which ways should the concept of reliability in mastery 
testing be formulated? How can reliability indices be approximated 
when repeated testing of the same examinees is not feasible? Which 
inferential procedures are appropriate for studies regarding ?sti- 
mates of reliability? 

(3) How should the rate of misclassif ication be assessed for 
domain-referenced tests? Wh^ are the sampling characteristics of 
the estimates? 

^4) What approaches should be used to study the consequences of 
making passing decisions on the basir of test scores? Which models 
would be useful in forecasting the budgetary consequences associated 
with the selection of a particular pass' .g score? 

(5) How should decisions based on test data be eval <»ted in 
terirs of efficiency or cost-effectiveness? 

(6) What are appropriate ways to assess the sensitivity oi a 
test vithin the context of instruction? 

(7) What are some of the scoring rules based on decision theory 
which may be useful in the context of mastery testing? 

(8) What are the appropriate procedures by which items can be 
selected from i?n item bank to form a test which must meet specific 
requirements regarding reliability or decision .ccuracy? 

(9) What procedures are appropriate in formulating decisions 
based on multivariate test data? 
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III, PUBLICATION SERIES IN MASTERY TEST ING 

As the Mastery Testing Project concludes, seventeen papers 
have been written. All have been distributed nationally through 
the Publication Series in Mastery Testing and are abstracted as 
follows. 

Research Memorandum 78-1 

Computation and Inference for Two Reliability 
Indices in Mastery Testing Based on 
the 3eca-Binomial Model 

Huynh Huynh 

Presented at the 17th Annual Southeastern Invitational Conference on 
Measurement in Education, University of North Carolina at Greensboro, 
December 8, 1978. Journal of Educational Statistics , Fall, 1979. 

Abstract : In mastery testing the raw agreement index and the kappa 
index may be secured via one test administration when the test scores 
follow beta-binomial distributions. This paper reports tables and a 
computer program which facilitate the computation of those indices 
and of their standard errors of estimate. Illustrations are provided 
in the foim of confidence intervals, hypothesis testing, and minimum 
sample sizes in reliability studies for mastery tests. 



Research Memorandum 78-2 

A Nonrandomized Minimax Solution for Passing Scores 
in the Binomial Error Model 

Huynh Huynh 

Psychometrika , June J 980. 

Abstract : A nonrandomized minimax solution is presented for mastery 
scores i:, the binomial error model. The computation does not require 
prior knowledge regarding an individual examinee or group test data 
for a population of examinees. The optimum mastery score minimizes 
the maximum risk which would be incurred by misclassif ication. A 
closed-form solution is provided for the case of constant losses, 
and tables are presented for a variety of situations including 
linear and quadratic losses. A scheme which allows for correction 
for guessing is also described. 
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Research Memorandum 79-1 

Accuiacy of Two Procedures for 
Estimating Reliability of Mastery Tests 

huynh Huynh 
Joseph C. Saunders 

Presented at the annual conference of the Easte±*i Educational 
Research Association, Kiawah Island, South Carolina, February 22-24, 
1979. A short version of this paper will appear in Journal of 
Educational Measurement (in press). 

Abstract : The beta-binomial estimates for the raw agreement index p 
and the kappa index in mastery testing are compared with those based 
on repeated testings in terms of bias and sampling stability* Across 
a variety of test score distributions, test lengths, and mastery 
scores, the beta-binomial estimates tend to underestimate the cor- 
responding population values. The percent of bias, however, is 
negligible (about 2.5%) for p and moderate (about 10%) for kappa. 
Both beta-binomial estimates are almost twice as stable as those 
based on repeated testings. Though the beta-binomial estimates 
presume equality of item difficulty, the data presented indicate 
that even gross departures from equality do not affect the perfor- 
mance of the estimates. 
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Bayesian and Empirical Bayes Approaches 
to Setting Test Passing Scores 

Huynh Huynh 
Joseph C. Saunders 

Presented at the symposium "Psychometric approaches to domain- 
referenced testing" sponsored jointly by the American Educational 
Research Association and the National Council on Measurement in 
Education at their annual meetings in San Francisco, April 8-12, 
1979. 

Abstract ; The Bayesian mastery scores as proposed by Swaminathan 
et al. and the empirical Bayes mastery sccres derived from Huynh' s 
decision-theoretic framework are compared on the basis of approxi- 
mate beta-binomial and real CTBS test data. It is found that the 
two sets of mastery scores are identical or almost identical as 
long as the test score distribution is reasonably symmetric or when 
the true criterion level is high. Large discrepancies tend to 
occur when this level is low, especiall> * T *»pn tM test scores con- 
centrate at some extreme scores or are fairly bumpy. However, in 
terms of mastery/nonmastery decision, the Huynh procedure provides 
the same classifications as the Bayesian met- \od in practically all 
situations. Moreover, the former may be used for tests of arbitrary 
length and has been generalized to more complex testing situations, 
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Research Memorandum 79-3 

Budgetary Consideration in 
Setting Mastery Scores 

Huynh Huynh 

Presented as part jf the symposium "Setting standards : Theory and 
practice" sponsored jointly by the American Educational Research 
Association and the National Council on Measurement in Education at 
their annual meetings in San Francisco , April 8-12, 1979. 

Abstract : A general model along with four illustrations is presented 
for the consideration of budgetary constraints in the setting of 
cutoff scores in instructional programs involving remedial actions 
regarding poor test performers. Eudgetary constraints normally put 
an upper limit on any choice of cutoff score. Given relevant infor- 
mation, this limit may be determined. Alternately, ways to assess 
the budgetary consequences assocMted with a given cutoff score are 
provided. Such information would be useful in any final decision 
regarding the cutoff score. 



Research Memorandum 79-4 

A Class of Mastery Scores Based 
on the Bivariate Normal Model 

Huynh Huynh 

P roceedings of the 1979 meeting of the American Statistical 
Association (Social Statistics Section). 

Abstract : This study touches some aspects of the determination of 
mastery scores on the basis of the bivariate normal test model. 
The loss ratio associated with classification errors is assumed to 
be constant, and the referral success function ranges in the normal 
ogive family. Alternately, the model also provides a fairly simple 
way to assess the Iocs consequences associated with each mastery 
score. Such information is deemed useful to the test user who may 
wish to examine these consequences before making a final c jice of 
cutoff score. It is also notad that the model provides a latent 
trait analysis for testing/measurement situations involving 
instructed and noninstructed groups, or pretest and posttest data. 
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Research Memorandum 79-5 

An Approximation to the True Ability Distribution 
in the Binomial Error Model and Applications 

Huynh Huynh 
Garrett K. Mandeville 

Abstract : Assuming that the density p of the true ability 6 in 
the binomial test score model is continuous in the closed interval 
[0,1], a Bernstein polynomial can be used to uniformly approximate 
p. Then via quadratic programming techniques, least-square esti- 
mates may be obtained for the coefficients defining the polynomial. 
The approximation, in turn, will yield estimates for any indices 
based on the univariate and/or bivariate density funccion associated 
with the binomial test score model. Numerical illustrations are 
provided for the projection of decision reliability and proportion 
of success in mastery testing. 



Research Memorandum 79-6 

Statistical Inference for False Positive and 
False Negative Error Rates in Mastery Testing 

Huynh Huynh 

Psychometrika , March 1980. 

Abstract : This paper describes an asymptotic inferential procedure 
for the estimates of the false positive and false negative error 
rates. Formulae and tables are described for the computation of 
the standard errors. A simulation study indicates that the asymp- 
totic standard errors may be used even with samples of 25 cases as 
long as the Kuder-Richardson Formula 21 reliability is reasonably 
large. Otherwise, a large sample would be required. 
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Research Memorandum 79-7 

An Empirical Bayes Approach to Decisions 
Based on Multivariate Test Data 

Huynh Huynh 

Presented at the annual meeting of the Psychometric Society, Iowa 
City, Iowa, May 28-30, 1980. 

Abstract ; A general framework for making mastery/nonmastery 
decisions based on multivariate test data is described in this 
study. Over all, mastery is granted (or denied) if the posterior 
expected loss associated with such action is s'naller than the one 
incurred by the denial (or grant) of mastery. An explicit form for 
the cutting contour which separates mastery -.nd nonmastery states 
in the test score space is given for multivariate test scores which 
follow a normal distribution with a constaat loss ratio. For the 
case involving multiple cutting scores in the true ability space, 
the test score cutting contour will resemble the boundary defined 
by multiple test cutting scores when the test reliabilities are 
reasonably close to unity. For tests with low reliabilities, deci- 
sions may very well be based simply on a suitably chosen composite 
score • 



Research Memorandum 80-1 

A Comparison of Two Approaches to Setting Passing 
Scores Based on the Nedelsky Procedure 

Joseph C. Saunders 
Joseph P. Ryan 
Huynh Huynh 

Presented at the annual conference of the Eastern Educational 
Research Association, Norfolk, Virginia, March 5-3, 1980. A pplied 
Psychological Measurement (in press). 

Abstract ; The Nedelsky procedure has been proposed as a method for 
setting minimum passing scores for multiple-choice tests, based on 
an analysis of item content. Two versions of the procedure are 
compared. Two groups of judges, one using each version, set passing 
scores for a classroom test. Comparisons are based on (1) the 
distributions of passing scores,. (2) the consistency of pass-fail 
decisions between the two versions, and (3) the consistency of pass- 
fail decisions between each version and the passing score estab- 
lished by the test designer. In addition, the relationship between 
the passing scor set by a judge and that judge's level of achieve- 
ment in the content area is investigated. 
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Research Memorandum 80-2 

Adequacy of Asymptotic Normal Theory in Estimating Reliability 
for Mastery Tests Based on the Beta-Binomial Model 

Huynh Huynh 

Abstract ; Simulated data based on five test score distributions 
indicate that a slight modification of the asymptotic normal theory 
for the estimation of the p and kappa indices in mastery testing 
will provide results which are in close agreement with those based 
on small samples. The modification is achieved through the multi- 
plication of the asymptotic standard errors of estimate by the 
3/4 

constant 1+m where m is the sample size. 



Research Memorandum 80-3 

Coneiderations for Sample Size in Reliability 
Studies for Mastery Tests 

Joseph C. Saunders 
Huynh Huynh 

Presented at the annual conference of the Eastern Educational 
Research Association, Norfolk, Virginia, March 5-8, 1980. 

Abstract : In most reliability studies, the precision of a relia- 
bility estimate varies inversely with the number of examinees 
(sample size). Thus, to achieve a given level of accuracy, some 
minimum sample size is required. An approximation for this minimum 
size may be made if some reasonable assumptions regarding the mean 
and standard deviation of the test score distribution can be made. 
To facilitate the computations, tables are developed based on the 
Comprehensive Tests of Basic Skills. The tables may be used for 
tests ranging in length from five to w hirty items, with percent 
cutoff scores of 60%, 70%, or 80%, and with examinee populations 
for which the test difficulty can be described as low, moderate, 
or high, and the test variability as low or moderate. The tables 
also reveal that for a given degree of accuracy, an estimate of 
kappa would require a considerably greater number of examinees 
than would an estimate of the raw agreement index. 
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Research Memorandum 80-4 

A Note on Decision-Theoretic 
Coefficients for Tests 

Huynh Huynh 

Abstract ; A modification is suggested for the decision-theoretic 
coefficient 6 proposed by van der Linden and Mellenbergh. Under 
reasonable assumptions, the modified index varies from 0 to 1 
inclusive. It is argued that in many practical applications of 
mastery testing, coefficients such as 6 are not readily available, 
and consistency of decisions may serve as evidence of the quality 
of the decision-making process. 



Research Memorandum 80-5 

Assessing Efficiency of Decisions 
in Mastery Testing 

Huynh Huynh 

A bstract : Two indices are proposed for assessing the efficiency of 
decisions in mastery testing. The indices are generalizations of 
the raw agreement index and the kappa index. Both express the 
reduction in the proportion o r average loss (or the gain in util- 
ity) resulting from the use of test scores to make decisions. 
Empirical data are presented which show little discrepancy between 
estimates based on the beta-binomial and compound binomial models 
for one index. 



Research Memorandum 80-6 

Selecting Items and Setting Passing Scores for Mastery Tests 
Based on the Two-Parameter Logistic Model 

Huynh Huynh 

Presented at the Informal Meeting o:: Model-Based Psychological Measurement 
sponsored by the Office of Naval Research, Iowa City, lowa, August 17-22, 1980. 

Abstract ; Three issues in mastery testing are considered, using a 
minimax decision framework, based on the two-parameter logistic 
model. The issues are: (1) setting passing scores, (2) assessing 
decision efficiency, and (3) selecting items to maximize decision 
efficiency. The losses or disutilities under consideration have a 
constant or normal ogive form. It is found that, in the context of 
minimax decisions, the item selection procedure based on maximum 
information may not provide the best decision etficiency. 
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Research Memorandum 80-7 



Assessing Test Sensitivity in Mastery Testing 



Huynh Huynh 



A preliminary version of this paper was presented as part of the 
symposium "Approaches to test design for the assessment of the 
effectiveness of educational programs" sponsored by the American 
Educational Research Association at its annual meeting in Boston, 
April >-22, 1980. 

Abstract : This paper addresses the concept of test sensitivity 
within the context of mastery testing. It is argued that 
correlation-based indices may not be appropriate for the assessment 
of test sensitivity. Global assessment of test sensitivity may be 
carried out via indices such as p-max or 6-max. Local measures of 
sensitivity may be described via a two-parameter logistic model. 
Procedures are described to check the tenability of test sensitivity 
on the basis of observed test data. 



Abstract : In mastery testing, decision accuracy refers to the 
proportion of examinees who are classified correctly, in one of 
several achievement categories, by test data. Decision consistency 
expresses the extent to which decisions agree across two test 
administrations. Based on twelve cases involving a wide range of 
reliabilities, it was found that decision accuracy and decision 

consistency were almost perfectly related. 
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Relationship between Decision Accuracy and 
Decision Consistency in Mastery Testing 



Huynh Huynh 
Joseph C. Saunders 
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IV . CONCLUDING REMARKS 

As the readers of this summary may note, the work of the 
Mastery Testing Project has focused on the very basic technical 
issues encountered in using test scores for making decisions 
regarding individual students. The work blended mathematical rigor 
with the ambiguity typically encountered in the reality of testing. 
Oftentimes, advanced mathematics was used, supplemented with com- 
puter simulation based on real test data collected from the South 
Carolina Statewide Testing Program. It is hoped that the many 
results reported herein will contribute to the best use of testing 
in the educational enterprise. 
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A NONRANDOMIZED MINIMAX SOLUTION FOR PASSING SCORES 
IN THE BINOMIAL ERROR MODiL 

Huynh Huynh 
University of South Carolina 



Psychometrika , June 1980. 

ABSTRACT 

A nonrandomized minimax solution is presented for passing 
scores in the binomial error model . The computation doeo not 
require prior knowledge regarding an individual examinee or group 
test data for a population of examinees. The nnHtmm passing score 
minimizes the maximum risk which would be incurred by reclassifi- 
cations. A closed-form solution is provided for the case of con- 
stant losses, and tables are presented for a variety of situations 
including linear and quadratic losses. A scheme which allows for 
correction for guessing is also described. 



1. INTRODUCTION 

Much interest has been generated in recent years on the setting 
of passing (mastery or cutoff) scores. Situations in which passing 
scores are needed include (a) entrance requirements for an instruc- 
tional program, (b) advancement of students from one instructional 
unit to the next, presumably more complex unit, (c) certification 

This paper has been distributed separately as RM 78-2, December, 1978. 
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for occupations and the professions, and (d) minimum competency 
testing legislated in several states. Most procedures for setting 
passing scores fall into three broad categories: comparisons with 
the performance oE other individuals (e.g., using norm-referenced 
data), an examination of item content (e.g., sr~h procedures as the 
Nedelsky scheme), and a consideration of the consequences incurred 
by misclassif ications. A fairly comprehensive review of some of 
these procedures may be found in Meskauskas (1976) and in Hambleton, 
Swaminathan, Algina, and Coulson (1978). 

Misclassif ications may be characterized by their probabilities 
of occurrence and losses. The papers by Fhandr (1974) and by 
Wilcox (1976) consider the selection of passing scores and of test 
length which would set maximum tolerable limits for the percents of 
false positive and false negative errors in decision. Both papers 
rely on the concept of indifference zones centered around the mini- 
mum true ability for mastery, and the procedures so presented may 
be generalized to include the case of arbitrary but constant losses. 
As subsequently described, the Fhandr-Wilcox presentation may be 
framed within the minimax context in statistical decision theory. 

A simultaneous consideration of false positive errors, false 
negative errors, and losses — often referred to as the decision- 
theoretic approach to setting passing scores — is presented in a 
number of sources including Swaminathan, Hambleton, and Algina 
(1975); Huynh (1976, 1977); and van der Linden and Mellenber^i 
(1977). These papers take into account knowledge concerning the 
true ability of the examinees, and therefore may be applicable when 
passing scores are to be set for a group of examinees. The f*:oce- 
dure advanced by Swaminathan et_ al. (1975) is based on the assump- 
tion of exchangeability of prior information as described in Lindley 
and Smith (1972) and implemented in Novick, Lewis, and Jackson 
(1973). It requires specification of how much prior informatici is 
exchangeable. On the other hand, solutions proposed by Huynh (1976, 
1977) may be classified as Bayes or empirical Bayes. The first 
qualifier applies to the case of the individual examinee, when the 
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prior distribution regarding his aoility must be available. This 
distribution may be assessed via procedures described in Novick and 
Jackson (1974) and implemented via the CAM system (Novick, Isaacs, 
and DeKeyrel, 1977). The second category, empirical Bayes, may be 
used when test data are available for a group of examinees. 

The empirical Bayes approach seems appropriate where past data 
or data collected in field testing are used for setting passing 
scores for future examinees who will take the same test or alter- 
nate forms of the same test. There are, however, situations in 
which such group data or prior information about the individual 
examinee m?y not be appropriate. This is the case of individualized 
instructional programs. Here decisions regarding mastery or nor- 
mastery for an individual examinee ought to be based solely on the 
subject's test score, not on the performance of other examinees 
who happen to be in the same situation. 

The present paper focuses on a minimax approach to setting 
passing scores. This procedure does not require specification of 
prior information reg, rding the ability of an individual examinee 
or group of examinees. Using this procedure, a passing score may 
be established prior to any administration of the test. Section 2 
of this paper presents the overall minimax framework for binary 
classifications. In subsequent sections, various illustrations are 
provided, based on the binomial error model. 

2. BASIC ELEMENTS OF THE MINIMAX PROCEDURE 

The true ability of a given examinee is defined as 6 vith 
range ft. For the binomial error model (Lord & Novick, 1968, 
chap. 23), 6 is the proportion of items in a large item pool that 
the examinee is expected to answer correctly, and ft is the interval 
[0,1]. If a test is administered to the examinee, it is assumed 
that his observed test score x is distributed according to a condi- 
tional density f(x|6). In subsequent discussions, the notation 
P(A|e) denotes the conditional probability that x is in A given 
that the true ability is 6. 
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A referral task (Huynh, 1976) shall be assumed to exist. The 
task Is operationally defined via a nondecreasing function s(0) 
which specifies the probability that an examinee with true ability 
0 will succeed in performing the task. The referral task may be 
real or hypothetical . For example, if the test scores reflect 
achievement in the current instructional unit, then the next, pre- 
sumably more advanced, unit may serve as the referral task. This 
may be the case, for example, if instructional units are hierarch- 
ically sequenced according to the level of complexity (Huynh and 
Perney, 1979). In other situations, such as minimum competency 
testing, a consensus on what constitutes an acceptable level of 
performance may be conceptualized as a referral task. To be spe- 
cific, let it be agreed that in order to qualify as a true master, 
an examinee must have a true ability of at least 0 q . The** the 

referral success function may be taken as s(9) * 0 for 0 < 0 and 

o 

s(6) - 1 for 0 > 0 . The constant 0 is referred to as a criterion 
— o o 

level by Hambleton and Novick (1973) and a true mastery score by 
Huynh (1976). 

The examinee will be classified in either the mastery status 
(action a^) or the nonmastery status (action a^) on the basis of 
the test score x and by relying on some decision rule c. Given a 
specific true ability score 0, test scores may take a variety of 
values in a certain range. Hence, for each examinee, actions a^ 
and a^ may both have positive probabilities of being chosen, These 
probabilities sum to one since either a^ or a^ must be taken. The 
performance of the examinee on the referral task may be deemed 
success (true state b^) or failure (true state b^) • If the true 
state is b^, then action a^ should be taken. For h^ f a^ should be 
selected. For these two cases, each . mrse of action taken is the 
best, hence no (opportunity) losses are involved. On the other 
hand, the combination (a^b^) constitutes a false positive decision, 
and (a^b^) a false negative classification. Let the loss asso- 
ciated with (a 1 ,b 2 ) be C f (0) and that incurred by (a^b^ be C s (0). 
These losses are functic\ i of a particular true ability 0. At this 
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true ability, occurs with probability s(8) a.id b 2 with probability 

1 - s(8). Hence, the loss is expected to be C f ( 6) • (l-s( 9) ) for 

taking action a , and C (6)»s(8) for taking action a 0 . 

x s 2 

Consider the decision rule denoted by c. This rule partitions 
the range of the test scores into two disjoint subsets: (tor 
action a^), and (for action a 2 ), each with a conditional probabil- 
ity of P(A 1 |e) and P(A 2 |e), respectively. For an examinee with true 
ability 6, the expected loss associated with c is 

L(c,e) - c f (e)-(i-s(e)).p(A 1 |e) + c s (e)-s(e)-p(A 2 |e) . (i) 

Let 

M(c) * sup L(c,8). (2) 

Then .Se minimax decision rule c q is the one which corresponds to 
the mining ^ (if it exists) of M(c) when c ranges in the space con- 
sisting of alx Possible decision rules. This paper, however, will 
restrict itself to uhe case of nonrandomized decision rules. 

More details regarding the minimax principle and its relation- 
ship with Bayesian decision procedures (as implemented in Huynh 
(1976), for example) may be found in Ferguson (1967). The reader 
may note that, in a number of situations, there exists a (least 
favorable) prior distribution on the true ability such that the 
corresponding Bayes solution is exactly the same as the minimax 
decision rule. 

The remaining portion of this paper will deal only with the 
binomial error model when it is used with a 0-1 form for the 
referral success function. The binomial error model appears to be 
applicable when the test given to each examinee can be thought of 
as a random sample of items drawn from a large item pool. On the 
other hand, the 0-1 form for s(6) implies a consensus on a minimum 
level of mastery on the true abilitv continuum. 

3. THE BINOMIAL ERROR MODEL WITH 0-1 REFERRAL SUCCESS 

Consider the case where s(6) * 0 for 6 < 6 Q and s(6) * 1 for 
9 >_ 9 Q * In the simple context of mastery testing, the inequality 
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M 6 < e^" describes a true nonmastery state whereas the inequality 

"9 & *' indicates a true mastery state. In other words, £ is the 
o o 

minimum true ability that an examinee must have in order to qualify 
for true mastery in the domain of content under consideration. It 
follows that the expected loss associated with the decision rule c 
as specified in (1) becomes 

c f (e)p(A, |e) if e < e 

L(c,6) = f 1 ° (3) 

c (e)p(A.|e) if e > e . 

s z o 

Now let 

l (c) = sup c (e)p(A |e) 
e<e 1 

o 



and 



L (c) = sup C (O)P(A |e) ; 
— o 

then 

M(c) = max {L (c) ,L 2 (c) } . 

Suppose that for a fixed 6, the distribution of x follows the 

binomial density function f(x) = (£) e X (l-6) n ~ X . This is called the 

binomial error model (Lord & Novick, 1968). Such a distribution 

belongs to the monotone likelihood ratio family (Ferguson, 1967, 

chap. 5). Under fairly general conditions regarding C f (6) and 

C (6), the search for a nonrandomized minimax rule c may be con- 
s o 

fined to the class of partitions of the test score range 

A^ « {x;x £ c - 1} and A 2 « {x;x J> c} defined by a cutoff score c. 

The cutoff score c , which corresponds to the minimax rule c , will 
o o 

be referred to as the minimax passing score . There are two degen- 
erate cases which correspond to c ■ 0 and c - n + 1. When c = 0, 
A^ is empty, and hence the examinee is declared a master regardless 
of his test score. On the other hand, A^ is empty if c - n + 1. 
For this situation, master> is always denied. 

~t follows that the minimax passing score may be found by 
minimizing the function M(c) ■ max {^(c) ,L 2 (c)} where 
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n 

,n x „x,, AV n-x 



^(c) - supC f (8) I (£)8 A (l-e) n ~ x (A) 

and 



e<e x-c 

o 



c-1 

L (c) = sup C (6) I (> X (l-8) n X . (5) 

e>e s x-o x 

— o 

The following section will provide the detailed computations 
for the case of constant losses. 

4. THE BINOMIAL ERROR MODEL WITH 0-1 
REFERRAL SUCCESS AND CONSTANT LOSSES 

Let and be two suitably chosen nonnegative constants 
such that 0 < 8 q - < 8 q + e 2 < 1. Without loss of generalit , 
the case of constant losses may be specified as follows: 
1 if 8 < 9 - e, 



C f (8) 



and 

C (8) = 



0 if 8 - < 8 < 8 , 
o 1 — o 



Q if 8 + e 0 < 8 
o I — 

0 if 8 < 8 < 8 + e„. 
o — o I 



Thus the region 8e [8 q - 8 q + is an indifference zone. For 
an examinee with a true ability within this region, it does not 
matter whether action a^ or is taken. It may be noted that the 
constant Q is the ratio of the loss caused by a false negative 
decision to that incurred by a false positive decision (i.e., 
Q - C s (8) f C f (8)). 

It can be verified that the functions L^(c) and L^(c) as 
detailed in (4) and (5) are given as 

^(c) = i (£)(e o - Cl ) x (i-e o + ei ) n " x (6) 

x=c 

and 

c-1 

L 2 (c) = Q I 0 (e o +e 2 )X(1 ~ e o~ e 2 )n ~ X ' (7) 
x=o 
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For the general case where and are not zero, the search for 
the minimax passing score c q may be accomplished by computing the 
value of M(c) ■ max {L^(c) ,L 2 (c) } for each value c ■ 0, 1, 2,..., n+1, 
and then selecting the value c q at which M(c) is the smallest. 

Numerical Example 

Assume n = 5, 8 q - .80, z^ = .10, z^ = .05, and Q = .80. 
Table 1 reports the values of L^, L^* and M at the passing scores 
of 0, 1, 2, 3, 4, 5. and 6. Note that both 0 and 6 are degenerate 
passing scores. The minimax passing score is c q = 5. 

TABLE 1 

Values of the Functions L, , L 0 , and M 



Passing Score 



Function 


0 


1 


2 


3 


4 


5 


6 


^(c) 


1 


.99757 


.96922 


.83692 


.52822 


.16807 


0 


L 2 (c) 


0 


.00006 


.00178 


.02129 


.13183 


.44503 


.80 


M(c) 


1 


.99757 


.96922 


.83692 


.52822 


.44503 


.80 



The minimax passing score is c =5. All computations were carried 
out with a table of cumulative binomial distributions. 

The aforementioned discussion encompasses part of the presenta- 
tion by Wilcox (1976) regarding the length and passing score of a 
mastery test. Table I of the Wilcox paper provides minimax passing 
scores for the following combinations: n = 8 (1) 20, 6 q (Wilcox's 
tt q ) - .70 (.05) .85, z l - e 2 (Wilcox's c) = .05, .10, and Q = 1. 

The maximum expected loss. M(c ), associated with the minimax 

o 

passing score is obtained by subtracting from one the minimum 

probability of a correct decision as tabulated in Wilcox's Table I. 

For examnle, with n - 10, 8 q ■ .75, z^ = = an( * Q = 1» the 

minima;; passing score is c q = b. The corresponding maximum expected 

loss is M(c ) - 1 - .6172 = .3828. 
o 

The remaining part of this paper will focus on the case 
C l = e 2 ~ ^* follows from Equations (6) and (7) that 

M(c) = max {L^c) ,Q* (l-L^c)) } 
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where 

n 



4(c) = i (»-e/- x . (8 ) 

x=c 



If the test score x were continuous, the tninimax passing score c q 
would be the one at which L^c) - Q-fl-L^c)). In other words, it 
would satisfy the equation 



o 



(9) 



If this equation has an integer solution c , then c is the Tninimax 

o o 

passing score. Otherwise, let c^ be the smallest integer such that 

CO) 

o 

The minimax passing score will be either c 1 or c'-l (or possiblv 

o o 

both), whichever minimizes the maximum expected loss M(c). 
Numerical Example 

Let n = 10, 6 o ^ .70, and Q = .5. Then via a table of cumula- 
tive binomial distributions, it may be found that c 1 * 9. At the 

o 

cutoff score 9, M(c) = 4253, and at the other cutoff score 8 
(=c^-l), M(c) ~ .3828. Thus the minimax passing score is c =8. 

Now let I(p,q;t) denote the incomplete beta function as tabu- 
lated in Pearson (1934) and implemented via computer routines such 
as BDTR of the IBM Scientific Subroutine Package (1971) or MDBETA 
of the International Mathematical and Statistical Library (1977). 
Inequation (10) may now be written as 

Kc;,n-c; + i ; e o ) <JL. 

This inequality is reminiscent of the one defining the Bayes 
(or empirical Bayes) passing score for the beta-binomial model as 
presented in Huynh (1976, p. 70-72). In fact, let us impose on the 
true ability 0 the prior beta density with parameters a and g. 
Then the Bayes (or empirical Bayes) passing score is the smallest 
integer c^ at which 
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I(^ r n+B- Cl ;8 o ) < -j^. (12) 

It appears from (11) and (12) that the minimax passing score c and 

o 

the Bayes passing score do not differ by more than one unit if 
6=1 and if a is sufficiently small, 

A special note is due for the case Q = 1, i.e., when the conse- 
quences associated with false positive decisions and false negative 
decisions are weighted equally. Equation (9) or Inequation (xO) 

indicates that the minimax passing score c would be chosen such 

o 

that, for an examinee with true ability G q , chances are about equal 
that he would be classified as a master or a nonmaster on the basis 
of the test score. 

Finally, a normal approximation is available for reasonably 
large n and for G q not too close to 0 or 1. Let £ be the 100/ (1+Q) 
percentile of the unit normal distribution. The minimax passing 
score may be approximated by the quantity 

c « 55 n8 „ + s(ne (l-e ))^. 

o o v o o J 

5. THE BINOMIAL ERROR MODEL WITH 0-1 REFERRAL SUCCESS 
AND POWER LOSSES CENTERING AROUND 9 Q 

Pi 

Consider now the loss functions C-(9) = (9 -G) for G < 9 
P2 r o o 

and C g (G) = Q(G-G q ) for G _> 9 o> where p , p 2> and Q are positive 

constants. Linear losses correspond to p^ - = 1 and squared 

error losses are 

score c, we have 



error losses are obtained by letting p^ = p 2 = 2. At the cutoff 



p i n 

L (c) - sup (G -e) 1 Z ( n )G X (l-G) n ~ X 
G<G ° x=c X 



and 



P 9 c-1 

L (c) =sup Q<0-0 ) £ ( n )0 X (l-G) n ~ X . 

0>0 x =o X 

— o 

P J 

For the special case c = 0, L, (c) = G and L 0 (c) = 0, hence 

1 O 2. 

M(c) = 8 . On the other band, when c = n+1, L (c) = 0 and 

2 P 2 
L 2 ^ c ) = Q(1-8 Q ) , hence M(c) = Q(1-G q ) . For other situations 

where 1 < c < n, it may be shown that there exist two values 0^ 
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and 9 2 , 0 < 6 1 < 6 q < 8 2 < 1 such that at each cutoff c, 



L^c) = l* 0 -9i) ?1 I (J^ejd-ej)" (13) 



x=c 

and 



2 r ,n N -.x,. n x n-x 



l 2 (c) = Q(e 2 -e o ) L z ( x n )8;(i-e 2 ) n " x . <u) 

x=o 

As in all previous discussions, M(c) = max {L^c) ,L 2 (c) }. The 
minluax passing score c q is the one at which the maximum expected 
loss M(c) is minimised. 

The determination of 0^ and 9 2 at each cutoff score c may be 
carried out via numerical approximation procedures such as the 
Newton-Raphson algorithm for solving nonlinear equations. 

5.1. Searching for L ^c) 

Consider now the function 
n 

z^e) - z (J|)e x (i-e) n ~ x . 

x=c 

The first derivative Z| of with respect to 9 is given as 

z|(e) = i (JJ)(xe x ' 1 (i-e) n " x - (n-x)e x (i-e) n " x " 1 ). 

x=c 

Taking into account that 
( x )x = n( x-l> 



and 



(£)(n-x) = n( n " 1 ), 



it follows chat 



Z|(6) = n 



z (""be^u-e) 11 -* - V ( n - 1 )e x (i-e) n - x - 1 l 

x~* X X I 

x=c x=c ' 



or 



z|(e) = c(^)e c " 1 (i-e) 1 



Now let 
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Pi 

H l (6) = (G o' 6) Z l (6) * 
Then the value 6 q of 6 which maximizes 1^(6) satisfies the equation 
H|(6 1 ) = 0, where 

H'(e) =- Pi (e o -e) Pl 1 z^e) + (e o -e) Fl z'U . 

In other words, 6 1 satisfies the equation Dj^Qj^ = 0, where 
n 

d (e)--p z ( n )e x (i-e) n-x + c( n )(e -eje^u-e) 11-0 - o. (15) 

v — « x CO 



To solve this equation via the Newton-Raphson algorithm, the 
derivative D^(8) is needed. It is given as 

Dj(e) = ctye^a-Q^-h^Q) (16) 

where 

G 1 (e) = -( Pl +i)e(i-e) + (e o -e) (c-i-(n-i)e) (17) 

or 

G,(e) = (n+p.)e 2 - ( Pl +c+(n-i)e )e + (c-i)e . (18) 

X X y X O' O 

Consider first the situation where c > 1. It may be seen from 

(17) that G.(0) = (c-l)e > 0 and G,(8 ) - -(p,+l)e (1-0 ) < 0. 
X O X o loo 

Hence it may be seen that G-(8) vanishes at only one point, 6* 

between 0 and 6 . The value of 9 is given as 
o 

* p +c+(n-l)d - {( Pl +c+(n-l)6j 2 - 4(n+ Pl )(c-l)6 }** 

f\ — X t* X O X o 



2(n+ Pl ) 



It follows that Dl(9) is positive when 0 < G < 8 and negative when 

* * 
6 < 8 < 8 . In other words, D-(8) is increasing when 0 < 8 < 8 , 

^ is it 

is decreasing when 0 < 8 < 0 , and reaches a maximum at 8 = 8 . 

o 

Since D, (0) = 0, D-(8j > 0. On the other hand, D.(6 ) < 0 as may 
XXX X o^ 

be seen from (15). Hence D,(8) = 0 at only 6, where 8 < 8, < 8 . 

1 1 1 o 

By entering c * ] directly in Equation (15), it may also be argued 

that D-tO) - 0 at only 0, somewhere between 0=0 and 0 . 
1 y 1 o 

The above discussion indicates that the value 0^ may be obtained 
via the Newton-Raphson iteration procedure with input data D^(0) an( * 
1)^(0) computed via (15), (16), and (17). The iteration process has 
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been found to converge if the suitably chosen starting value for 6 

* 

is somewhere between 9 and 6 . 

o 



5.2. Searching for U(9) 



In the expression defining L 2 (c) at the beginning of this 
Lon, let C Q = 
may be seen that 



section, let £ Q = 1-0 q , C - 1-6, y * n-x, and d - n-c+1. It then 



L (c) = Q sup (£ - C ) P2 I Oc y (l-C) n " y . 



It follows that the search for 0£, and hence 12(c), may be conducted 
in the same way as in the locating of Q . 

6. A FRAMEWORK OF CORRECi-iON FOR GUESSING 

Consider now the case where each test item has A alternatives, 
and let us assume that an examinee without knowledge on a given item 
will randomly choose one of the A alternatives as his response. 
Thus the framework of knowledge-or-random-guessing is used in the 
present section. 

As in previous sections, let 0 be the true proportion of items 
thet an examinee has knowledge of and would respond correctly to if 
given. Since the examinee guesses randomly on the remaining items 
(which account for a proportion 1-0), and since each item has A 
alternatives, the proportion of items that would be answered cor- 
rectly by pure guessing is (1-0) /A. Thus an examinee with true 
ability 0 will actually have a probability of t » 0+(l-0)/A to 
answer correctly each item of the pool of items from which the test 
is assembled. It may be noted that since 0< 0 <1,|< t < 1, 

Now let 0 q , p^, and p^ have the same meaning as in the begin- 
ning of Section 5, and let 

'o = e o +(1 - e o )/A ' 
Then it may be seen that 

e-e - ^(t-t ) 

o A-l o' 
and hence 
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L i (c) = ( ^r )Pl su p <v t)Pl " (c) tX d-t) n " x . 

1 x=c 



Pi n 

E 

x=c 

A — o 

and 



(19) 



A X P 2 , N p 2 C " 1 



L 2 (c) = Q(^) sup (t-t o ) 1 Z (^)t(l^t) n - X . (20) 



t>t Q x *o 



For the two degenerate cases c - 0 and c - n+1, the maximum 
expected loss M(c) takes the values 



-co - (^r> Pl ct 0 - i/ 2 

and 



M(n+1) = Q(-^-) 2 (l- t ) 2 

A— X O 



As f or 1 < c < n, the search for L 2 (c) of (20) may be conducted via 

tho procedure described in Section 5.2. The value L. (c) from (19), 

1 

with the constraint - < t < t , may be obtained by going through 
the steps described in Section 5.1 to obtain the maximum of the 
function 



g(o = (t o - t ) Pl I (£)t x u- t ) n - x 

X"C 



under the constraint t < t and the value t* at which the maximum 

1 0 
occurs. If t* > then 

L i (c) = ( z^r )Pl 8(t *>- 

On thi» other hand, if t* < t» thon 

— A 

As in other cases, M(c) = max {1^ (c) ,L 2 (c) } and the minimax passing 
score is the cue «U which M(c) is I ho smaLlest. 

Numerical Example 

,et n * 15, 6 Q = .60, A = 4, p ± = p 2 = .5, and Q = .25. The 
minimax passing score is 12. Without correction for guessing, the 
minimax passing score would be 11. 
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L RELATIONSHIP BETWEEN MINIMAX PASSING 
SCORES AND OTHER PARAMETERS 

Extensive computations as well as the examination of Appendix A 

reported in Section 8 reveal that, other things being the same, the 

minimax passing score is a nondecr easing function of n, 0 , and p A 

o 2 

and a nonincreasing function of A, p^, and Q. These trends seem to 
be justified intuitively. For example, a low Q or a high p 2 will 
reduce the consequences incurred with a false negative error; 
hence, a higher passing score might be needed to dampen the overall 
expected loss associated with the decision problem. On the other 
hand, high values of p^ will reduce the consequences of a false 
positive error, thus making a lower passing score tolerable. As 
for the number A of alternatives, a low value for A will provide 
opportunity for some extra probability of getting a correct answer 
beyond the true ability of the examinee. Thus it would be sensible 
to increase the passing score in order to offset this unwarranted 
benefit. 

8. TABLES OF MINIMAX PASSING SCORES 

The computations described in Sections 5 and 6 may be imple- 
mented where computer facilities are available. A FORTRAN IV 
routine will be described in the next section. In a number of 
instances, however, a passing score might be needed quickly. 
Appendix A presents a set of tables of passing scores for the case 
of no correction for guessing (Section 5) only. 

All computations were carried out via the FORTRAN program 
described in Section 9. The tables are set up with the presumption 
that the false-negative consequences are less serious than those 
incurred by false positive errors. The parameter Q is set at .25, 
.50, .75, and 1.00. Sixteen combinations of und are used, 
namely those in which these parameters vary from .50 to 2.00 in steps 
of .50. The number of items is set at n = 3 (1) 20, and the crite- 
rion level at 6 q = .50 (.05) .90. 

It is possible to get a passing score of n+1, especially whe \ 
0 Q is large and/or Q is small. Such a mastery score indicates that 
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nonmastery is always declared regardless of test score. This 

peculiarity is due 'o the discontinuous nature of the binomial 

probability density and produces the seeming paradox noted in the 

papers by Novick and Lewis (1974, p. 153-154) and by Wilcox (1976, 

p. 362, footnote) and in Section 10 of this report. In a practical 

sense, the peculiarity may be avoided by (i) not allowing 0 to be 

o 

unrealistically high, and (ii) not letting the loss associated with 
one type of error in decision (false positive or false negative) 
dominate that associated with the other type of error. 

In a number of instances, it may be possible to deduce a pass- 
ing score for nontabled entries by taking advantage of the relation- 
ships described in Section 7. 

Example 1 

Let n = 10, P x = P 2 = .5, and Q = .75. At 0 q « .70 and .75, 
the passing score is 8. Hence for all 0 between .70 and .75, it 
may be assumed that the passing score is also 8. 

Example 2 

Let n = 10, Pl = .5, 0 q = .70, and Q = .25. At both p 2 = .5 
and 1.0, the passing score is 9. It may be assumed that the same 
passing score holds for any p 2 between the two given values. 

9. COMPUTER PROGRAM 

A FORTRAN IV routine for passing score computations based on 
Sections 5 and 6 is listed in Appendix B. The program requires 
two packaged subroutines, DRTNI from the Scientific Subroutine 
Package (1971) and MDBIN of the International Mathematical and 
Statistical Library (1977). 

The main part of the program contains an attempt to solve 
Equation (15) iteratively at each c via the Newton-Raphson procedure 
for nonlinear equations, as implemented by DRTNI. A good starting 
value for 0 is required for convergence; therefore, the following 
steps are built into the program. 

1. First, the value 0 of Section 5.1 is computed. 
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2. The interval (6*,9 o ) will then be divided into N equal 
intervals using (N-l) points. The value of D^e) of (15) 
is computed at successive dividing points until two 
points, 0 fl and 6 b> are found such that the product 

3. Then the interval (6 a> 8 b ) will be subdivided in M equal 
intervals in order to search for two successive dividing 
points 6., e e such that VAqJdAq ) < 0. 

£ S 1 t 1 s 

4. Finally, the starting value for DRTNI is set at 

(e. + e )/2. 

t s 

In the construction of the tables of Section 8, the following 

values were used: N « 20 and M « 50. The tolerance for 9 was set 

at EPS « .0001. Subroutine DRTNI converged in all cases listed in 

the tables. For long tests along with 6 very near 0 or 1, an M 

c 

larger than 50 might be needed for convergence. 

10. A SEEMING PARADOX 
Consider the mastery decision defined by the parameters n « 3, 

8 o = " 8> p l = p 2 = ' 5> and Q * The nonrandomized minimax 

passing score is 3, at which the maximum expected loss M(c) is .218. 
Now let us suppose that the decision has been carried out on a 
continuous random variable Y independent of the ability e of the 
examinee. Let c be any cutoff score. Then 

p 1 

L 1 (c) * sup (6 -9) 1 p( Y > c] = .89443 P(Y > c) 

e<e 

o 

and 

L 2 (c) = Q-sup (6-6 ) C P(Y < c) = .11180(1-P(Y > c)) . 
o 

It follows the maximum expected loss M(c) is minimized when 
L x (c) = L 2 (c) at which p(y > c) * .111, and M(c) = .100. Thus, as 
judged by the minimax principle, the decision rule of randomly 
assigning mastery status with an 11.1 percent probability and 
uonmastery status with an 88.9 percent probabilit: is better than 
that based on the test score! 
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The apparent paradox is actually caused by the restriction of 
the decision problem to the class of nonrandomized classifications 
defined by the p; sing scores of 0, l,... f n, n+1. A similar 
contradiction is also displayed in a paper by Wilcox (1976) in 
which the minimum probability of a correct decision is not an 
increasing function of the number of test items. 

The paradox, however, may be resolved by a consideration of 
the entire class of randomized decision rules. It is well known 
(Ferguson, 1967, Section 2.8) that under fairly general conditions, 
there always exists a randomized decision rule which is as good as 
or better than a given nonrandomized decision rule. Randomized 
minimax decisions, unfortunately, seem harder to approach than 
nonrandomized decisions. 

11. SUMMARY 

In this re r vt solutions are provided for the setting of pass- 
ing scores within the context of nonrandomized decisions based on 
the binomial test score model. No assumption is required regarding 
the true ability distribution of the individual examinee or of the 
group of examinees under study. The model assumes that the test is 
formed by a random selection of items from a large (real or hypo- 
thetical) pool of items. In . dition, it requires specification of 
the minimum true ability for mastery and of consequences incurred 
by misclassif icatiun errors. A scheme for correction-f or-guessing 
within the minimax framework is also presented. Tables and descrip- 
tions v a computer program are also provided to facilitate the 
determination of passing scores. 
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APPENDIX A 



Tables of Minimax Passing Scores 
in the Binomial Error Model 
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Table of Minimax Mastery Scores in the Binomial Error Model 

with p -0.5 and p -0.5 
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Table of Minimax Mastery Scores in the Binomial Error Model 

with p -0.5 and p -1.0 
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APPENDIX B 

SUBROUTINE MIMAX 

This subroutine computes the minimax passing (mastery) score 
for the binomial error model in mastery testing. 

Disclaimer ; The computer program hereafter isted has been written 
with care and tesced extensively under a variety of conditions. The 
author, however, makes no warranty as to its accuracy and function- 
ing, nor shall the fact of its distribution imply such warranty. 
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SUBROUTINE MIUAX(N,TA. IA.P1 ,P2 ,0. 1Z) 

C 

C********** } A AA A******* A' A' ' A ' A A A AAA A Art^A^AAiHhllr*********^ 



C 

C THIS SUBROUTINE COMPUTES THE MINIMAX PASSING (MASTFRY) SCORE FOR 

C THE BINOMIAL ERROR MODEL IN MASTERY TESTING. 

C 

C INPUT DATA ARE: 

C N NUMBER OF TEST ITEMS 

C TA CRITERION LEVEL (THETA ZERO) 

C IA NUMBER OF OPTIONS (ALTERNATIVES) FOR EACH MULT IP LE- 

C ' CHOICE ITEM. THIS INFORMATION IS NEEDED IF CORRECTION 

C FOR GUESSING IS TO BE PERFORMED . IF NO CORRECTION FOR 

C GUESSING IS REQUIRED , SET IA - 0. 

C PI EXPONENT FOR FALSE POSITIVE ERROR LOSS 

C P2 EXPONENT FOR FALSE NEGATIVE ERROR LOSS 

C Q WEIGHTING CONSTANT FOR FALSE NEGATIVE ERROR LOSS 

C 

C OUTPUT DATA IS 

C 12 MINIMAX PASSING (MASTERY) SCORE 

C 

C SUBROUTINES REQUIRED: 

C DRTIII FROM SSP (NEWTOH-RALPHSON ITERATION PROCESS) 

C KDBIN FROM IMSL (BINOMIAL PROBABILITY) 

C 



C******A AAA AAA- A k A A A A Art ** *** * A A A A" A AAAAAAAAA **A A A** A *********************** 
C 

COMMON NKEEP,IC,R,TT,KODE,IOPT 
DOUBLE PRECISION FL1 . FL2 ,FMAX , FMAX1 

C 

WRITE (6.200) N,TA,IA,P1.P2,Q 
200 FORMAT ('1\T4,' NUMBER OF ITEMS .',14/ 

1 T4. 'CRITERION LEVEL . '.F10.5/ 

2 TA, 'NUMBER OF OPTIONS' ,14/ 

3 T4,'P1 ' ,F10.5/ 

4 T4,'P2 ■ .F10.5/ 

5 T4,'L0SS RATIO Q '.F10.5) 

DMAX-AMIN1(1.,Q) 

NKEEP-N 

DD-IA *1./(IA-1> 
IF(IA.EQ.O) DD-1. 
X1-DD**P1 
X2«DD**P2 
TZ-TA 

IF(IA.NE.O) TZ-TA*(1.-1./IA)+1./IA 
IC1-0 

FMAX1-1.D50 

C 

DO 10 ID-l.N 

C 

IC-ID 
R-Pl 
TT-TZ 
IOPT-IA 

CALL LMAX(FLl) 

FL1-FL1*X1 

R-P2 

TT-1 . -TZ 
IC-N-ID+1 
I0PT— 1 

CALL LMAXCFL2) 

FL2-FL2*0 

FL2-FL2*X2 

FMAX-DMAX1 (FL1 . FL2) 

IF(FMAX. GE.FMAX1) GOTO 10 

IC1-ID 




59 



1 



HUYNH 



ERIC 



*HAX1-FMAX 
10 CONTINUE 

C 

AMAX-TZ**P1 

AMAX-AJIAX*X1 

B-Q*(1.-TZ)**P2 

B-B*X2 

IX-0 

IF(AMAX.LE.B) GOTO 13 
IX-N+1 
AMAX-B 
13 IZ- T .C1 

IF (AMAX . LT . FMAX1 ) IZ-IX 

C 

WRITE(6.220) IZ 

220 dftitom^ ^ ' 'MINDIAX PASSING 1 /3X, 'SCORE - U) 

END 

C 

SUBROUTINE LMAX(FL) 
COMMON N.IC,P,TZ,KODE,IA 

DOUBLE PRECISION T , F , DERF , TS f FL t Tl # Fl , DEPJF1 

EXTERNAL FCT 
XX-0. 

IF(IA.GT.0)XX-1.0/IA 

EPS-. 0001 

IEND-200 

KODE-0 

NN-20 

MM- 50 

H-P+IC+(N-1)*TZ 

Tl-(Ii-SQRT(H*H-A*(K +P)*(IC-1)*TZ) ) / (2*(N+pn 

IF(Tl.LE.O.DO) Tl-l.D-20 

DD-(TZ-T1)/NN 

TS-T1 

CALL FCT(T1 , Fl , DERF1) 
DO 5 I-l.NN 
T-T1+I*DD 
CALL FCT(T ( F,DERF) 
IF(F*F1.LE.0.0) GOTO 10 
TS-T 
Fl-F 
5 CONTINUE 
10 DD-(T-TS)/MM 

CALL FCT (TS , Fl , DERF1) 
Tl-TS 

DO 15 I-1,MM 

T-T1+I*DD 

CALL FCT (T,F, DERF) 

IF(F1*F.LE.0.) GOTO 20 

TS-T 

Fl-F 
15 CONTINUE 
20 TS-(TS+T)/2.0 

UD-T-TS 

IF (DD.LE.EPS) GOTO 25 
KODE-1 

5^™ R S X ^ • F ^ DERF ■ FCT • TS ■ EPS ■ IEWD • IER > 
IF(IER.NE.O) WRITE(6,200) IER 

200 FORMAT( 1 0 ' , 'ERROR iN THE SSP SUBROUTINE DR^NI 1 I/O 
25 IF(IA.GT.O.AND.T.LT.XX)T-XX UD ™ vl ^ UK - NI *™> 
S-T 

CALL MDBINdC-l.N.S.D.FK.IER) 
IF(IER.Nl.O) WRITE(6 # :iO) IER 
210 FORMAT ( ' 0 ' , 1 ERROR IN THE IMSL SUBROUTINE MDBIN ' 14) 
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FL«(TZ-T)**P*(1.-D) 

RETURN 

END 

C 

SUBROUTINE FCT(T,F,DERF) 
COMMON N,IC,P,TZ,KODE 
EXTERNAL BI 
INTEGER BI 

DOUBLE PRECISION T f F f DERF , G 
S-T 

LL-BI(N,IC) 

F-IC*LL*(T2-T)*T**(IC-1)*(1.D0-T)**(N-IC) 

CALL MDBIN(IC-1,N,S,D,PK,IER) 

F— P*(1.D0-D)+F 

IF(KODE.EQ.O) RETURN 

DEPP-0 

IF(IC.EQ.N) GOTO 10 
G«(1.D0-T)**(N-IC-1) 
IF(IC.EQ.l) GOTO 5 
DEPJF-(iC-l)*TZ*T**(IC-2)*G 
5 DERF-((N+P)*T**IC-(P+IC+(N-1)*TZ) *T**(IC-1) )*G+DERF 
DERF«DERF*IC*LL 
RETURN 

10 DERF«N*T**(N-2)*(- (N+P)*T+(N-1)*TZ) 
RETURN 
EOT 

C 

FUNCTION BI(N f M) 
INTEGER BI 
BI-1 

IF(M*(N-M).EQ.O) RETURN 
MH-N-11 

IF(MM.GT.M)MM-M 
DO 15 J-l.MM 
15 EI-BI*(N-J+1)/J 
END 

//LKED. SYSLIB DD 

// DD DSN-ACAD.IMSL. DP. SUBLIB, DISP-SHR 
/ / DD DSN-ACAD . IMSL . S P . SUBLIB , DiSP-SHR 
// DD DSI1-SSP. SUBLIB .DISP-SIIR 
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BAYESIAN AND EMPIRICAL BAYES APPROACHES TO SETTING 
PASSING SCORES ON MASTERY TESTS 

Huynh Huynh 
Joseph C. Saunders 

University of South Carolina 

Presented at the symposium "Psychometric approaches to domain- 
referenced testing" sponsored jointly by the American Educational 
Research Association and the National Council on Measurement in 
Education at their annual meetings in San Francisco, April 8-12, 1979. 

ABSTRACT 

The Bayesian approach to setting passing s^jres as proposed by 
Swaminathan, Hambleton, and Algina is compared with the empirical 
Bayes approach to the same problem that is derived from Huynh' s 
decision-theoretic framework. Comparisons are based on simulated 
data which follow an approximate beta-binomial distribution and on 
real test data sampled from a statewide testing program. It is 
found that the two procedures lead to setting identical or almost 
identical passing scores as long as the test score distribution is 
reasonably symmetric or when the minimum mastery level or criterion 
level is high. Larger discrepancies tend to occur when this level 
is low, especially when the distribution of test scores is concen- 
trated at a few extreme scores or when the frequencies are irregu- 
lar. However, in terms of mastery /nonmastery decisions, the two 
procedures result in the same classifications in practically all 
situations. However, the empirical Bayes procedure may be used for 
tests of any length, while the Bayesian procedure is recommended 
only for tests of 8 or more items. Additionally, the empirical 

This paper has been distributed separately as RM 79-2, April, 1979. 
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Bayes procedure can be generalized and applied to more complex 
testing situations with less difficulty than the Bayesian procedure. 

1. INTRODUCTION 

Among the many decision-theoretic approaches to setting pass- 
ing scores (or standards) for mastery tests, there are at lease two 
methods which rely on test data collected from a group of examinees. 
The Bayesian procedure, as presented in Swaminathan, Hambleton, and 
AJgina (1975), assumes that prior knowledge regarding the examinees 
is exchangeable (Novick, Lewis, & Jackson, 1973) and can be quanti- 
fied in some appropriate manner. On the other hand, the empirical 
Bayes approach, as formulated in Huynh (1976a), uses only the true 
ability distribution of the examinees and makes no assumption re- 
garding prior knowledge about the examinees. Both procedures use 
test data collected from a group of examinees and establish passing 
scores for mastery tests by minimizing certain loss function?. The 
purpose of this paper is to present a comparison of the two sets of 
standards (passing scores) formulated under a variety of conditions 
which can be expected lo be encountered in mastery testing or in 
minimum competency testing. The comparison will be made first on 
the basis of approximate beta-binomial test scores. Further com- 
parisons will be made using the Comprehensi" t Tests of Basic Skills 
(CTBS, 1973) data collected in the ]978 South Carolina Statewide 
Testing Program. 

2. AN OVERVIEW OF THE BAYESIAN AND 
EMPIRICAL BAYES APPROACHES 

Overall Framework 

The Bayesian framework as presented by Swaminathan et al. and 
the special empirical Bayes procedure described in Huynh (1976a, 
p. 70-73) start with a typical four-corner setup used in decision 
theory. (See Figure I, p. 78, for the basic elements of this setup.) 
Let 0 (tt in the notation of Swaminathan et al.) be the true score (or 
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true ability) of an examinee and x be the observed test score as 
obtained from an n-item test. For the binomial error model adopted 
in both standard setting approaches, 6 is the proportion of items 
in a real or hypothetical item pool that an examinee answers cor- 
rectly- Let a person be called a master if that person's true 

score 0 is such that 6 > 0 and a nonmaster if 6 < 0 . Here, 0 is 

— o o * o 

a given constant which defines the lower boundary of the mastery 

level or the criterion level- Since a person's true score cannot 

be observed directly, decisions about whether to call the person a 

master must be based on an observed test sccre- What remains to be 

determined is the cutoff score c that will be in some sense optimal. 

On the basis of the test score x, a person is called a master 

if x > c and a nonmaster if x < c- A correct decision is made 

whenever either (a) 0 > 0 and x > c, or (b) 0 < 0 and x < c- 

— o — o 

Otherwise, either a false positive error (0 < 0 and x > c) or a 
false negative error (0 >^ 0 q and x < c) is encountered - 

In the case where the loss associated with each error is con- 
stant, generality is not diminished if we let the Iops incurred by 
a false positive error be equal to 1 and that associated with a 
false negative error be equal to Q. Here, Q expresses the ratio of 
the false negative error loss to the false positive error loss. 
(In the notation of Swaminathan et al., Q * ^ 2 1 ^12'^ 
Bayesian Approach 

Now let an n-item test be given to m examinees. In the Bayes- 
ian procedure as implemented by Swaminathan al . , the prior in- 
formation regarding the examinees is assumed to be exchangeable 
(i.e., prior knowledge regarding one examinee can be interchanged 
with that associated with another examinee without causing any dis- 
turbance in the decision problem). The model requires knowledge 
(prior belief) of the distribution of the variance of true scores 
for the group. (In point of fact, an arcsine transformation of G 
is used.) This prior distribution is taken to be the inverse chi- 
square distribution with parameter X and degrees of freedom v. A 

recommended choice of v is 8 (Novick, et al.fiJ.973). 

n 4 
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To assess X, let t be the number of test items wMch would 
need to be administered to a typical ^xairinei in order to obtain as 
much information about that examinee's 8 as we already have v Then, 
X - 3/(2t+l). Wang (1973) has tables to facilitate computation in 
this procedure. In the setup of the Wang tables, X/v is chosen as 
.01, .02, .03, .04, and .05. These ratios coirespond to the t val- 
ues of 18.25, 8.875, 5.75, 4.1875, and 3.25. Given the prior infor- 
mation as revealed through X and v and the test data of m subjects, 
it is possible via the Wang tables to compute the two expected 
losses: Pr(8 < 8 q | test data) and Q-Pr(8 > 8 q | test data, at 
each test score. A Bayesian passing score is then the smallest 
score at which the first expected loss is smaller than the second 
one. More details may be found in Swaminathan e£ al . (1975) and 
in Novicketal. (1973). 

Empirical Bayes Approach 

The empirical Bayes solution assumes that the m examinees 
constitute a random sample from a population for which the true 
ability 8 follows a known distributional form such as the beta 
density with parameters a and B (Keats & Lord, 1962, page 68). 
Sample test data are used to obtain the estimates a and B, and the 
results are used to compute the probability of a false positive 
decision Pr(8 < B q$ x > c) and of a false negative decision 
Q»Pr(8 > 8 q , x < c) at a given cutoff score c. The optimum passing 
score (henceforth referred to simply as the passii _ e ) be 
the value of c at which the average loss, Pr(8 < 8 , x > c) 
+ Q-Pr(8 > 8 q , x < c), is the smallest. 

The procedure is implemented as follows. Let x and s be the 
mean and standard deviation of the test scores, and let the Kuder- 
Richarison reliability coefficient be defined as 

1 2 * 

ns 

Then 

a - (-1 + l/a 21 )x 

and 
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c = (n+a+g-l)e + £ 
o 
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£ = -a + n/a 21 - n. 
7or test scores with insufficient variability, a ?1 may be negative. 
If this occurs simply replace by the smallest positive relia- 
bility estimate which happens to be available. Let I denote the 
incomplete beta function as tabulated in Pearson (1934) and imple- 
mented *»ia computer programs such as the IBM Scientific Subroutine 
Package (1971) or the IMSL (1977). Then the passing score is the 
smallest integer c, at which 

I(a+c,n+6-c;e o ) < Q/(l+q). (1) 
A normal approximation 4 s available if there is a sufficiently 
large number of items and if 6 q is not near 0 or 1- Let £ denote 
the 100/ (1+Q) percentile of the unit normal dis* lbution. Then the 
test passing score is nearly equal to 

(n+a+3-l)e (i-e ))% . a + .5, (2) 
o o J 

The data presented in Huynh (1976b) indicate that the passing score 
computed front Equation (2) does not differ appreciably from the one 
deduced from Inequation (1) when the test consists of 20 items and 
when 6 q is within the range from .50 to .80. 

3. A COMPARISON OF BAYESIAN AND EMPIRICAL BAYES 
PASSING SCORES FOR APPROXIMATE 
BETA-BINOMIAL TEST DATA 

The passing score obtained via the empirical Bayes approach, 
as revealed by Inequation (1), is based on test score data that 
tollow a beta-binomial distribution. It may be of interest to 
compare the Bayesian approach to setting a passing score with the 
empirical Bayes approach, using test data which follow closely a 
beta-binomial form. 

Both the present comparison and the one detaileu in the next 
section are based on tests with ten items. In these comparisons, 
the criterion or minimum mastery level is set at 6 q * .60, .70, and 
.80. The loss ratLo is chosen to be Q * .25, ,50, 1.00, and 2.00, 
(A Idss ratio smaller than one indicates that a false positive 
error is less serjous than a false negative error.) To compute a 
assing pcore via the Bayesian approach, it is necessary to specify 
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the ratio X/v or, equivalently, the quantity t as described in 
Section 2. It may be recalled that t aay be interpreted as the 
number of "test items" which are believed to be as informative as 
the prior belief about the examinees. In practical situations in- 
volving standard setting, it seems unreasonable to let the prior 
belief v carry as much weight as the objective test data. In other 
words, it is unlikely that t is too close to n. Thus for the 
comparisons based on 10-item tests reported in this section ana i- 
Section 4 as well as the comparisons based on 20-item tests 
described in Section 5, the t-values are chosen to be 8.875 
(X/v - .02), 5.75 (X/v - .03), 4.1875 (X/v - .04), and 3.25 
(X/v » .05). 

The first five test score frequency distributions (labeled Al 
through A5 in Table 1) serve as the data base for the comparison of 
the passing scores computed by the two procedures using test score 
distributions that are approximately beta-binomial. Each is delib- 
erately chosen (i) to yield an 8 * value (variance of the arcsine- 

square-root transformation of the test scores) conforming as closely 

2 

as possible to the tabulated s g values of the Wang tables (so that 
no Interpolation would be necessary) and (ii) to reflect several 
degrees of skewness and variability thought to be typical of mas- 
tery testing situations. (Also in Table 1, and explained below, 
are distributions of actual test scores from the South Carolina 
Statewide Testing Program.) It may be noted that m Table 1, the 
quantity D(%) represents the maximum percent difference between 
the observed and beta-binomial-fitted cumulati/r frequencies. A 
small D- value indicates a good fit. 

Table 2 reports the Bayesian passing scorea and the corre- 
sponding empirical Bayes passing scores (in italics) for several 
combinations of 6^, Q, and t. The data indicate that for the situa- 
tions under consideration, the Bayesian and empirical Bayes passing 
scores are identical, or nearly so, as long as the test score dis- 
tribution is reasonably symmetrical (Cases A2, A4, and A5). For 
highly skewed distributions (Cases Al and A3) the two passing 
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TABLE 1 

Frequency Distributions of Test Scores Used 
in Comparisons of Passing Scores 



Data Source/ # Skew- Frequency at score of 



Set 


Subtest 


m 


D(%)' 


S.D. 


ness 0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 




Approximate 


Beta 


-Binomial 
























Al 


Fictitious 


40 


3.1 


1.36 


-0.61 










1 


3 


6 


8 


11 


11 


A2 


Fictitious 


80 


1.0 


1.87 


-0.31 




1 


3 


6 


10 


13 


16 


15 


11 


5 


A3 


Fictitious 


4U 


1.2 


1.01 


-1.51 












1 


2 


4 


10 


23 


A4 


Fictitious 


40 


1.6 


2.01 


-0.02 


1 


3 


5 


6 


7 


7 


5 


4 


2 


0 


A5 


Fictitious 


40 


1,0 


2.15 


0.12 1 


3 


5 


6 


7 


6 


5 


4 


2 


1 


0 




Comprehensive Tests of BasI 


C oKlXXS 






















Bl 


Mathematics 
































concepts and 
































application. 


20 


6.7 


1.28 


U • O J 












Z 


i 


d 


/, 

H 




B2 


Mathematics 
































computations 


20 


9.2 


1.45 


-0.24 












3 


4 


3 


4 


6 


B3 


Spelling 


20 


6.1 


1.76 


-1.04 








2 


0 


1 


2 


6 




5 


B4 


Social 
































studies 


40 


6.2 


2.11 


0.27 


1 


4 


5 


9 


5 


5 


6 


3 


1 


1 


B5 


Language 
































expression 


40 


8.? 


1.86 


-0.53 




1 


1 


5 


3 


4 


11 


10 


3 


2 


B6 


Reading 


40 


4.1 


1.22 


-2.12 










1 


1 


2 


3 


3 


30 


B7 


Science 


60 


5.6 


1.74 


-0.22 






2 


6 


10 


8 


14 


8 


12 


0 


B8 


Reading 
































vocabulary 


60 


3.2 


1.56 


-1.75 






1 


0 


3 


1 


5 


5 


16 


29 


B9 


Reading 
































vocabulary 


80 


2.7 


1.68 


-1.49 






2 


1 


2 


5 


6 


11 


23 


30 


BIO Spelling 


80 


2.1 


1.50 


-1.44 






1 


0 


2 


4 


7 


12 


16 


38 



m = total number of scores in the distribution. 



D(%) represents the maximum percent difference between the observed 
and beta-binomial-fitted cumulative frequencies. All are not sig- 
nificant at the ten percent level of significance. 

scores rarely differ by more than one unit when the criterion level 
6 q is relatively high (.70 or .80) and when X/v is such that t is 
not too close to n, say when X/v is at least .03. Large discrepan- 
cies, however, may occur at a low criterion level such as .60 or 
whei, t is close to n. 



p t • 
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TABLE 2 

Bayesian and Empirical Bayes Passing Score* for Five 
Approximate Beta-Binomial Test Score Distributions 



Bayesian (at Vv - .02, .03, .04, .05) 



Data and empirical Bayes (in italics) at 



Set 


e 

o 


Q - .25 


Q - .50 


Q - 1.00 


Q - 2.00 


Al 


.60 


4, 5, 6, 6, 4 


3, 4, 5, 5, 2 


2, 3, 4, 4, 1 


1, 2, 3, 3, 0 




.70 


7, 8, 8, 8, 6 


6, 7, 7, 7, 5 


5, 5, 6, 6, 4 


4, 4, 5, 5, 3 




.80 


10,10,10,10, 9 


9, 9, 9, 9, 8 


8, 8, 8, 8, 7 


7, 7, 7, 7, 6 


A2 


.60 


7, 8, 8, 8, 7 


6, 7, 7, 7, 6 


5, 6, 6, 6, 5 


4, 4, 5, 5, 4 




.70 


10,10, 9, 9, 9 


9, 9, 9, 9, 9 


8, 8, 8, 8, 8 


7, 7, 7, 7, 7 




.80 


10,10,10,10,20 


10,10,10,10,10 


10,10,10,10,10 


9, 9, 9, 9, 9 


A3 


.60 


1. 3, 4 S 4, 3 


1, 2, 3, 3, 2 


0, 1, 2, 2, 1 


0, 1, 1, 2, 0 




.70 


4, 5, 6, 6, 6 


3, 4, 5, 5, 5 


2, 3, 4, 4, 4 


1, 2, 3, 3, 3 




.80 


8, 8, 9, 9, 8 


7, 7, 8, 8, 7 


5, 6, 7, 7, 6 


4. 5, 6, 6, 5 


A4 


.60 


9, 9, 9, 9, 9 


9, 8, 8, 8, 8 


8, 7, 7, 7, 8 


7 6, 6, 6, 6 




.70 


10,10,10,10,10 


10, ',10, 10, 10 


10, 9, 9, 9,10 


9, 9, 8, 8, 9 




.80 


10,10,10,10,10 


10,10,10,10,10 


10,10,10.10,10 


10,10,10,10,10 


A5 


.60 


10,10, 9,10 


9, 9, 9, 9, 9 


8, 8, 8, 8, 8 


7, 7, 7, 7, 7 




.0 


10,10,10,10,10 


10,10,10,10,10 


10,10, 9, 9,10 


9, 9, 9, 9, 9 




.80 


10,10,10,10,10 


10,10,10,10,10 


10 f 10,10, 10,20 10,10,10,10,10 



4. A COMPARISON OF BAYESIAN AND EMPIRICAL 
BAYES PASSING SCORES FOR CTBS TEST DATA 



This phase of the study is based on a 10% systematic sample 
of the entire third trade CTBS-Level C data file compiled during the 
1978 South Carolina Statewide Testing Program. To obtain the fre- 
quency distributions labeled as Bl to B10 (in Tables 1 and 3), the 
following procedure was used> First, ten 10-item subtests were 
assembled by random selection of items from each CTBS subtest. 
Next, for each 10-item subtest, a frequency distribution was con- 
structed for each school district which had at lea.Jt 20 students in 

2 

the systematic sample, and the corresponding s value was obtained. 

2 8 
(The s g values were distributed as follows: .10 to .50 (32%), .L»l 

to .75 (38%), .76 to 1.00 (20%), and more than I. 00 (10%). Large 
2 

s g values tended to associate with subtests dealing with reading 

comprehension (sentences or paragraphs), language expression, and 

language mechanics.) Third, among the frequency d tributions with 
2 

s values included between .01 and .05, ten were finally selected 
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and altered slightly so that the total number of examinees (m) was 
exactly 20, 40, 60, or 80. 

Table 3 lists the Bayesian and empirical Bayes passing scores 
under a variety of conditions. As in the previous section, the data 

TABLE 3 

Bayesian and Empirical Bayes Passing Scores 
for Ten CTBS Test Score Distributions 



Bayesian (at X/v - .02, .03, .04, .05) 



Data and empirical Bayes (in italics) at 



Set 


e o Q = .25 




Q = .50 




Q - 


■ 1. 


00 






Q - 2, 


00 




Bl 


.60 5, 5, 6, 6 


, 3 


4, 4, 5, 5 


, 2 


3, 3, 


, 4, 


4 


, 1 


2, 


2, 


3, 


3 


, o 




.70 7, 7, 8, 8 


, 6 


6, 6, 7, 7 


, 5 


5, 5, 


6, 


6 


, 4 


4. 


1 1 1 


C 

J , 


5 


, 3 




.80 10,10,10,10 


, 9 


9, 9, 9, 9 


, 8 


8, 8, 


8, 


8 


, 7 


7, 


7, 


7, 


7 


, 6 


B2 


.60 6, 6, 6, 6, 


5 


5, 5, 5, 5 


, 4 


4, 4, 


4, 


5 


, 2 


3, 


3, 


3, 


4 


, 1 




.70 8, 8, 8, 8 


, 7 


7, 7, 7, 7 


, 6 


6, 6, 


6, 


6, 


5 


5, 


C 

J , 


c 

J J 


6 


, 4 




.80 10,10,10,10, 


9 


9, 9, 9, 9 


, 9 


8, 8, 


8, 


8 


, 8 


7, 


7, 


8, 


8 


, 7 


B3 


.60 6, 6, 7, 7, 


. 6 


5, 5, 6, 6, 


6 


4, 4, 


5, 


5, 


5 


3, 


4, 


4, 


4, 


, 4 




.70 8, 8, 8, 8, 


8 


7, 7, 8, 8, 


7 


6, 7, 


7, 


7, 


6 


5, 


o , 


0 , 


6 


, 6 




.80 10,10,10,10, 


10 


9, 9, 9, 9, 


, 9 


9, 9, 


9, 


9, 


8 


8, 


8, 


8, 


8, 




B4 


.60 9, 9, 9, 9, 


9 


9, 8, 8, 8, 


8 


8, 8, 


7, 


7, 


7 


7, 


7, 


6, 


6, 


, 7 




.70 10,10,10,10, 


10 


10,10,10,10, 


10 


10, 9, 


9, 


9, 


9 


9, 




Q 

O , 


8 


, 9 




.80 10,10,10,10, 


10 


10,10,10,10, 


10 


10,10, 


10,10, 


10 


10, 


10, 


10, 


10, 


10 


B5 


.60 8, 8, 8, 8, 


7 


7, 7, 7, 7, 


6 


6, 6, 


6, 


6, 


5 


4, 


5, 


5, 


5 


, 4 




.70 10,10, 9, 9, 


10 


9, 9, 9, 9, 


9 


8, 8, 


8, 


8, 


8 


7, 


7, 


7, 


7, 


7 




.80 10,10,10,10, 


10 


10,10,10,10, 


10 


10,10, 


10,10, 


10 


9, 


9, 


9, 


9, 


9 


B6 


.60 2, 3, 4, 5, 


6 


1. 2, 3, 4, 


6 


1, 2, 


2, 


3, 


5 


0, 


1, 


1, 


2, 


4 




.70 5, 5, 6, 7, 


8 


3, 4, 5, 6, 


7 


2, 3, 




5, 


6 


2, 


2, 


3, 


4, 


6 




.80 8, 8, 9, 9, 


9 


7, 7, 8, 8, 


8 


6, 6, 


7, 


7, 


8 


4, 


5, 


6, 


6, 


7 


B7 


.60 8, 8, 8, 8, 


7 


7, 7, 7, 7, 


6 


5, 6, 


6, 


6, 


5 


4, 


5, 


5, 


5, 


4 




.70 10,10,10,10, 


9 


9, 9, 9, 9, 


9 


8, 8, 


8 


8, 


8 


7, 


7, 


7, 


7, 


7 




.80 10,10,10,10, 


10 10,10,10,10, 


10 


10,10, 


10,10, 


10 


10, 


10, 


9, 


9, 


10 


B8 


.60 3, 4, 5, 6, 


6 


2, 3, 4, 5, 


6 


2, 2, 


3, 


4, 


5 


1, 


2, 


2, 


3, 


4 




.70 6, 7, 7, 8, 


8 


5, 6, 6, 7, 


7 


4, 5, 


5, 


6, 


6 


n 

-> , 


4, 


4, 


5, 


6 




.80 9, 9, 9, 9, 


9 


8, 8, 9, 9, 


8 


7, 7, 


8, 


8, 


8 


6, 


6, 


7, 


7, 


7 


B9 


.60 4, 5, 5, 6, 


6 


3, 4, 4, 5, 


6 


2, 3, 


3, 


4, 


5 


1, 


2, 


3, 


3, 


4 




.70 7, 7, 8, 8, 


8 


4, 6, 7, 7, 


7 


4, 5, 


6, 


6, 


6 


3, 


4, 


5, 


5, 


6 




.80 9,10,10,10, 


9 


9, 9, 9, 9, 


9 


8, 8, 


8, 


8, 


8 


6, 


7, 


7, 


7, 


7 


B10 


.60 3, 4, 5, 6, 


6 


2, 3, 4, 5, 


5 


1. 2, 


3, 


4, 


5 


1, 


1, 


2, 


3, 


4 




.70 6, 7, 7, 8, 


8 


5, 6, 6, 7, 


7 


4, 4, 


5, 


6, 


6 


3, 


3, 


4, 


5, 


5 




• t'J 9, 9, 9, 9. 


9 


8, 8, 9, 9, 


8 


7, 7, 


8, 


8, 


8 


6, 


6, 


h 


7, 


7 



9 
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show that the two sets of passing scores are the same, or nearly 
so f is long as the test score distribution is reasonably symmetric 
(see cases B4, B5 f and B7) • Discrepancies in these situations are 
rarely larger than one unit. For most othc- situations, the dif- 
ference between the two values for a passing score is seldom larger 
than one unit when the criterion 8 q is *70 or .CO and when A/v is 
at least .03. The same magnitude of difference, one unit, also 
tends to hold at 9^ =» # 60 unless the test scores pile up at extreme 
values (Case B6) or unless the frequencies are fairly irregular 
(Case Bl). 

5, ADDITIONAL DAI A FOR MODERATELY 
SKEWED DISTRIBUTIONS 

Additional comparisons were made for ten 20-item tests with 
distributions having skewness ranging from -1.109 to .117 (see 
Table 4) . These tests were assembled in the same way as the 10- 
item tests described in Section 4. As in the previous sections, 
the critetion level 6 q was set at ,60, .70, and .80, and the loss 
ratio Q at .25, .50, 1.C0, and 2.00. The prior knowledge about the 
examinees was assumed to be equivalent to a number of items, t, of 
8,875 (A/v = .02), 5.75 (A/v - .03), 4.1875 (A/v = .04), and 3.25 
(A/v « .05). For all the 480 combinations under consideration, the 

TABLE 4 

Frequency Distribution of Score* on Ten CTBS Subtests 
Mentioned in Section 5 



Frequency at flcore of 



Subtest 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 17 


18 


19 


20 


Reading vocabulary 














1 


1 


5 


3 


4 


7 


4 


8 


3 


4 


Spelling 
















1 


1 


2 


3 


2 


3 


8 


12 


8 


Science 




1 


1 


1 

J. 


3 


3 


4 


3 


1 


9 


4 


5 


2 


1 


1 


1 


Social studies 


2 


0 


2 


0 


3 


1 


2 


2 


6 


9 


J 


4 


4 


1 


3 


0 


Social studies 




1 


2 


5 


3 


3 


1 


6 


5 


4 


2 


2 


5 


0 


0 


1 


Reading vocabulary 






2 


0 


0 


2 


1 


4 


4 


3 


3 


4 


8 


3 


4 


2 


Mathematics concepts 
































and application 




1 


0 


0 


1 


2 


3 


2 


3 


4 


0 


7 


7 


2 


6 


2 


Reading vocabulary 
















1 


2 


3 


2 


5 


5 


6 


9 


7 


Social studies 


1 


3 


1 


1 


1 


0 


2 


5 


3 


6 


3 


5 


4 


4 


1 


0 


Science 


1 


1 


4 


2 


? 


2 


4 


2 


4 


2 


3 


<\ 


3 


5 


0 


1 
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absolute value of the discrepancies between the two computed 
passing scores are distributed as follows: 0 (35%), 1 (37%), 2 
(15%), 3 (5%), and 4 or more (8%). Hence in about three-fourths of 
all situations, the Bayesian and empirical Bayes passing scores do 
not differ from each other by more than one unit. 

6. AGREEMENT OF MASTERY/NQNMASTERY DECISIONS 

As noted in Section 4, there are situations (such as some 
cases associated with the Al, Bl, and B6 data sets) where the pass- 
ing scores obtained from the two methods differ appreciably. This 
may seem disheartening. However, the procedures provide mastery/ 
nonmastery classifications which are in high agreement for most 

cases under consideration. For Data Set Al with 9 * .60 and .70, 

o * 

for example, the combined proportions of students identically clas- 
sified in either the mastery or nonmastery c&tegory by the Bayesian 
procedure (with X/y «* .05) and by the empirical Bayes procedure are 
88%, 95%, 99%, and 100% for Q - .25, .50, 1.00, and 2.00 respect- 
ively. Over the fifteen data sees of Table 1 and with the same 
values for X/v and Q, the proportions of identical classifications 
reach 94%, 96%, 98, and 97% respectively. As for the data of 
Table 4, these proportions stand at 98%, 98%, 98%, and 97%. 

Though the overall agreement for classifications is high for 
the data considered in this study, some individual cases may show 
less agreement than others. These cases include situations such as 
A2 with 9 q = .60, Q * .25, and X/v * .05 where the Bayesian passing 
score of 8 and the empirical Bayes passing score of 7 are located 
near the center of the test score distribution. The shift of only 
one unit in test score in this case actually causes 1C students out 
of a total of 80 to be classified differently by the two procedures. 
Visible disagreement between the classifications defined by the 
Bayesian and empirical Bayes procedures may occur in situations 
tfhere scores with high frequencies of occurrence are selected as 
the passing scores. If this is the case, the proportion of stu- 
dents classified in the mastery (or nonmastery) category is not 
likely o be close to either 0% or 10C%. In otu r situation;* where 



HUYNH & SAUNDERS 



most otudents are declared masters (Data Set Al with 0 ■ .60, 

o 

X/v - .05, and Q - 2.00) or nonmasters (Data Set A5 with 6 q - .70, 
X/v « .05, and Q - 1.00), the agreement in classifications is 
almost perfect. 

7. DISCUSSION AND CONCLUSION 

The results described in previous sections may be summarized 
as follows: (i) Bayesian passing scores and those computed via the 
empirical Bayes procedure are identical or almost identical as long 
as the test score frequency distribution is reasonably symmetric or 
when the criterion level 6 q Is sufficiently high (.70 or .80); 
(ii) large discrepancies in passing scores may occur at criterion 
levels .60 (or below), especially when the test scores pile up 
at a fc / extreme values or when the frequency distribution is 
irregular; (iii) however, mastery/nonmastery decisions derived from 
the two procedures are most often identical. Overall, the combined 
proportion of students similarly classified by both procedures is 
about 97Z. 

All in all, there is little difference between the Bayesian 
approach as described by Swaminathan jet al. and the Huynh empirical 
Bayes procedure described here, either in terms of the resulting 
passing scores or in terms of the mastery/nonmastery categorization. 

It should be pointed out t; it the procedure by Swaminathan et 
al . relies cn a normal arcsine-square-root transformation of the 
test data and is therefore considered adequate only when the test 
has at least 8 items. In addition, the scheme requires the evalua- 
tion of certain posterior probabilities. This may be done via the 
MARFRO computer program (mentioned in Wang, 1973) or via the Wang 
tables. To the chagrin of the writers, many frequency distribu- 
tions such as those derived from the CTBS test data of the South 

2 

Carolina Statewide Testing Program have s values much larger than 
the upper bound of .05 allowed in the above-mentioned tables. In 
addition, the constraint of having at least 8 items seems to be 
quite severe in many practical situations involving objective- 
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referenced testing. Such tests frequently have 5 or fewer items 
per objective. 

The empirical Bayes approach in its simplest form, as pre- 
sented in Huynh (1976a), requires that the test scores follow a 
beta-binomial distribution. There are indications (Keats & Lord, 
1962; Duncan, 1974; Huynh & Saunders, 1979; also see Table 1) that 
the model adequately fits many test score distributions. Moreover, 
it is known (Subkoviak, 1978; Huynh & Saunders, 197 r ) that the 
model is useful in the estimation of the reliability of mastery 
classification based on one test administration. In addition, 
using the empirical Bayes approach, passing scores may be computed 
for tests of any length and can be approximated quickly via 
Equation (2). 

It may be noted that the Bayesian and empirical Bayes proce- 
dures discussed in this paper deal with the setting of passing 
scores for a particular test. Both procedures assume the availabil- 
ity of a minimum mastery or criterion level 9 and the availability 
of other information such as Q t the ratio of the loss incurred by 
a false positive decision to that incurred by a false negative one. 
In the context of testing for instructional purposes, 9 q may be 
based on the judgment of a curriculum specialist or a knowledgeable 
teacher and Q may be assessed via the time losses encountered by a 
misdecision (Huynh, 1976a). The issue is much more involved for 
end-of-program certification, such as high school graduation (mini- 
mum competency) testing programs legislated in several states. The 
reader is referred to Jaeger (1976) and Shepard (1976) for insight 
regarding some of these issues. 

The empirical Bayes approach with the availability of a pre- 
determined criterion level, however, is only the simplest form of 
the general framework of mastery evaluation as approached by Huynh 
(1976a). The essential component of this model is an external task 
(real or hypothetical) that examinees are supposed to perform once 
they are granted mastery of the objectives or content upon which a 
test is based. Such an external task may be identified in the 
context of instruction, especially when instructional units are 
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sequenced in some logical order* If this requirement is fulfilled, 
the specification of 6 q is no longer necessary. Some suggestions 
for solutions along this line have been presented elsewhere (Huynh, 
1976a, p. 73-75; Huynh, 1977; Huynh & Perney, 19~9) . To the 
knowledge of the writers, the Bayesian approach as presented by 
Swaminathan et al_. has not been generalized to situations other 
than those involving constant losses and when a criterion level is 
available. Although such a generalization may be made, the numer- 
ical analysis would be more involved than can be expected from the 
empirical Bayes approach. 

As indicated previously, both standard setting procedures 
studied in this paper are based on group data and therefore are 
appropriate to the extent that minimization of loss is considered 
for the entire group of examinees. This may be the case for mini- 
mum competency testing where resources for remedial instruction are 
limited. Procedures relating to standard setting in the absence of 
group data are available (see, for example, Huynh, 1978). 

In conclusion, the empirical Bayes approach yields mastery/ 
nonmastery decisions identical in most cases to those based on the 
Baye3ian approach. In addition, the former approach is aimpler in 
terms of computations, is applicable to any test length, and has 
been generalized to more complex testing situations. 
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ABSTRACT 

This study touches some aspects of the determination of 
passing (cutoff, mastery) scores on the basis of the bivariate 
normal test model. The loss ratio associated with classification 
errors is assumed to be constant, and the referral success function 
is assumed to belong to the normal ogive family. Alternately the 
model also provides a fairly simple way to assess the loss conse- 
quences associated with each passing score. Such information is 
deemed useful to the test user who may wish to examine these con- 
sequences before making a final choice of passing score. 



1 . INTRODUCTION 

A general framework for setting passing (cutoff, mastery) 
scores in binary classification (or mastery testing) has been pro- 
vided recently (Huynh, 1976). Applications of the procedure to 
test data distributed as the beta-binomial model have also been 
presented (Huynh, 1976, 1977). The framework assumes that the true 

This paper has been distributed separately as RM 79-4, April, 1979. 
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ability of a population cf subjects may be described by a random 
variable 0 with probability density function p(0). If only one 
subject is involved, then p(0) describes the prior information 
regarding this subject's ability. A test is administered to the 
subject and the resulting test score is denoted as x. The test 
score is then compared to a passing (or cutoff) score equal to a 
constant c. If x is equal to or greater than c, the subject 
passes (or is declared a "master"). If x is less than c, the sub- 
ject does not pass (or is declared a "nonmaster") . The problem is 
to determine a value of c which is optimum in some sense. 

The model, as proposed, postulates the availability of a 
referral task which the subjects are expected to be able to perform 
if they are classified as having mastered the competencies under- 
lying the test scores. Performance on the referral task is cate- 
gorized as success or failure. The probability of a successful 
performance on the task by a subject with true ability 0 is defined 
via a nondecreasing function s(0), the referral task. Each referral 
task corresponds to a unique function s(0). Conversely, from a 
purely mathematical point of view, any nondecreasing function s(0) 
may be conceptualized as a referral task. 

The referral task, thus, may be real or hypothetical. For 
example, if an integer addition unit is to be followed by lessons 
on integer multiplication, then performance on a multiplication 
test may serve as a referral task for a test tapping the ability 
to add integers. Othe illustrations of real referral tasks may also 
be found in situations where the sequence of instructional units 
forms a linear hierarchy. i n a number of situations, a referral 
task can be conceptualized. For example, in minimum competency 
testing programs legislated in several states, a consensus on what 
constitutes a minimum level of performance for mastery may serve as 
a basis for a referral task. To be specific, let us agree that in 
order to qualify for mastery, an examinee must have a true ability 
of at least 0^. Then the nondecreasing function s(0) which takes 
the value of 0 if 9 < 6 q and the value of 1 for 0 > 6 q mathematically 

80 



NORIiAL PASSING SCORES 



defines the referral task for this case. The special 0-1 form for 
s(0) has been considered by a number of writers including Hambleton 
and Novick (1973). 

Now let C f (6) represent the opportunity loss incurrti by 
granting mastery status to a subject who will eventually fail in 
performing the referral tas.. (a false positive error). Likewise, 
let C s^ be the lo8S a88 °ciated with tne denial o2 mastery to a 
subjec^ whose performance on the task would be deemed successful (a 
false negative error). Under these conditions, r^isonable choice 

for an optimum passing score would be the score c at which the 

o 

average loss across all subjects in the population (or the Bayes 
risk in th» case of only one subject) is smallest. Details regard- 
ing the computation of c q may be found ta Huynh (1976). 

When test scores may be assumed to follow a beta-binomial 
model and when the referral success function s(9) is of the 0-1, 
linear, or cubic form, closed-form solutions exist for c (Huynh, 
1976, 1977). As is well known, the binomial error model is appro- 
priate when each examinee is given an independent sample of items 
(Lord and Novick, 1968, chap. 23). There are indications that 
several ^esc score distributions mijjhi fit the beta-binomial frame- 
work even if examinees in each distribution respond to the same set 
of items. 

There are models other than the bet' -bincmial framework which 
could be used to represent test data. For example, many frequency 
distributions obtained from standardized tests are known to follow 
closely a normal distribution. Models using a blvariate normal 
distributi^ for rhe true score 6 and the observed score x are not 
uncommon in educational measurement and Bayesian statistical lit- 
erature. Moreover, as an implication of the Central Limit Theorem, 
the beta-binomial distribution will resemble a bivariate norma] 
distribution vhen the number of test items is sufficiently large. 

The purpose of this paper is to provide tfte computation for 
the optiaur issing score (mastery score) fo/ the bivariate normal 
test score model with constant losses and 0-1 r normal ogive s(8) . 
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Since normal test scores form a continuous scale, the optimum 
passing score c q satisfies the equation 

/ Q t(c 8 (e) + c f (e))s(e) - c f (e)}p(e|c o )de - o. a) 

In the above expression, i represents the sample space of 6. For 
the sake of completeness, a procedure will also be proposed for 
approximating the referral success function s(0). 

2. PASSING SCORE COMPUTATION FOR THE BIVARIATE 
NORMAL MODEL WITH CONSTANT LOSSES 
AND NORMAL OGIVE REFERRAL SUCCESS 

Without any loss of generality, let C f (e) - 1 and C (e) = Q. 
Here Q expresses the ratio of the loss incurred by a false negative 
error to that associated v^th a false positive error. Now let the 
referral succoss he defined as 

e-e 



so) - f n (- 



') 



(2) 



where 6 y and o are two constants and F^(.) denotes the cumulative 
distribution function of a unit normal random variable. In addition, 
let x be in its standardized form (with zero mean and unit variance). 
With p as the test reliability, the mean and variance of e are 
respectively 0 and p, and the correlation between x and e is 

It is now assumed that the vector (e,x) follows a bivariate 
normal distribution. It may he then verified that the conditional 
density p(e|c o ) is given as a ntrmal density with mean pc and 
variance p(l-p). F -uation (1) nw becomes 



+ » 



/ 



(*-e ) 



or 



p(e|c o )de 



+oo 



e-e 



The integral in Equation (3) may be written as 
i p s vc-o ; ^ 

A 



(3) 




(t-e o )' 



2o' 



dt} exp 



(e-pc o )' 

2(p-p 2 ) 



de 
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This integral may be viewed as the probab .lity of the joint event 

< 6 < % t < 6) associated with two independent random variables 
t and 6. The random variable t has mean 9 q and variance a 2 ; tht 
second random variable 3 lias mean pc Q and variance p - p 2 . Now the 

difference t - 9 follows a normal distribution with mean 9 - oc 

2 2 ° ° 

and variance p - p + a . Sii^e the mentioned joint event is 

equivalent to the condition t - 6 < 0, it follows that the value of 
A is 

F N (^V e o )/(p " p2+a2 ^) # Let 5 be the 100/(HQ) percentile of the 
unit normal distribution, e.g. F^(£) « l/U+Q). Then c is gi ven as 

X r/ 2. 2 



e o + ?/p-p +o 

_ s - 
O 



C n • (A) 



If the test scores have mean p and a standard deviation a . 

x x 
then the test cutoff score is given as C ■ u + c *c . 

o p x ox 

The following remarks may be made about Equation (4). First 
2 

by letting a = 0, the normal ogive s(9> will degenerate to a 0-1 

form with the jump occurring at 9 q . Thus if true nonmastery status 

is defined by 9 < 9 Q and true mastery by 6 > 9 q , then the cutoff 

score is c q « 9 Q /p + C^^p*. Next, when misdecisions are weighted 

equally in term* of losses (i.e., when Q « 1), c and 9 relate to 

o o 

each other via the equation 9 o ■ pc Q . This expression is reminiscent 
of the Kelly formula which defines the regression of true score on 
test score (Lord and Novick, 1968, p. 65). Finally, when the rela- 
tionship between the ability 9 and the referral task is fuzzy, i.e., 
when a is large, the cutoff 3core c will 3hoot sharply abov* the 
"central value" 9 Q /p if Q < 1 and will locate appreciably below 
this central value if Q > 1. 

It may be noted that the unstandardized passing score C may 
be written as 



2 

Let a be the squared standard error of measurement. Then 

2 2 
a ■ (l-p)a and 
e x 
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(5) 



Numerical Example 1 

Let P x = 100, a x - 15, p = ,90, 8 - 1, o - .5, and Q = .5. 

Then 5« .432, -i-d c q * 1.391. The raw (unstandardlzed) cutoff 

score is found ^ he C ■ 120.86. 

o 

~. ESTIMATION PROCEDURE FOR 
NORMAL OGIVE REFERRAL SUCCESS 

Now let ?(x,i) be the proportion of subjects who have i test 
score of x and succeed in performing the referral task. Then from 
Equation (13) of Huynh (1976, p. 74), it may be seen that 

f CO 



g (x,i) - J h(x,e)s(e)de 



vhere ii(x,8) is the bivariate normal density of x and 6. It follov 
that 



g(x,l) - f N (x)J F N 



p(e|x)de 



where f^CO Is the unit normal density. Hence from the derivations 
in the middle part of the previous section, 
px-p_ 



f N (x) 



N 



/p-p +o 



The ratio p(x) = g(x t l)/f N U) represents the (conditional) propor- 
tion of students who, at the test score x, will succeed in perform- 
ing the referral task. Now let 



and 



2 2 h 
a - p/(p-p +o K 



0 * -e o /(p. P 2 +a 2 ) Ja , 



then 



p(x) - F N (ax+6) . 



(6) 
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If £(x) denotes the 100p(x) percentile of the unit normal distribu- 
tion, then 

£(x) coc + B . (7) 

Now let p(x), 5(x) be the observed values of p(x) and £(x) . Let 
w(x) be a suitably chosen weight function at th' score x. Then via 
the least squares technique, the estimates for a and 3 are given as 
a - s(5)*r(x,£) (g) 

and 

5. > m (9) 
where ? # and s(0 are the mean and standard deviation of the £(x) 
values, and r(x,0 is the correlation between the x and ?(x) values, 
each pair being weighted by w(x). The computation, of course, is 
carried out only over the x values at which the sample values p(x) 
are available. The reader may recall that the test scores x are in 
standardized form. 

It may be noted that p(x) is an increasing function of x. 
Hence it seems reasonable to require that the sample value p(x) be 
a nondecreasing function of x. This may be done by applying the 
Pool-Adjacent-Violator algorithm (Barlow, Bartholomew, Bremner, and 
Brunk, 1972, p. 13) using w(x) as the w^ ght function. In addition, 
since all p(x) values must be included strictly between 0 and 1, 
the algorithm must be conducted such that the adjusted values p(x) 
conform to this requirement. (See Table 1 for an illustration.) 

As in any least square procedure, the weight function w(x) may 

be chosen in a variety of ways. It appears to the author that the 

number of subjects at each test score might serve as a *.3asonable 

choice for this function. 

Once the estimates a and S have been determined, the estimates 
2 

for e Q and a may be derived from Equatiby (5) and (6). Taese are 

e o - -ps/i (10) 

and 

-2 2,-2 ( 2 

a «p/a - p + p. (13) 
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In the case where Equation (11) yields a negative value, a reasonable 
2 

choice for o would be 0. 

Numerical Example 2 

Table 1 presents the basic data for this example. The test 
reliability is taken to be p « .90. The summary data are £ t « 
-.2280, s(0 - .8668, and r(x,0 » .9723. It follows that a « 

.8427 and 3 - -.2280, hence 6 « .244 and o 2 - 1.050. 

o 

4. ASSESSING THE CONSEQUENC ES 
OF SELECTING A MASTERY SCORE 

Section 2 provider the computation of mastery scores when the 
loss ratio Q is known. In z number of applications, however, the 
test user may not be willing to specify in advance a value for Q. 
Instead the user may wish to look at the consequences **sociated 
with each cutoff score before making a final choice. Such a prac- 
tice is not uncommon in real testing situations. Both Jaeger (1976) 
and Shepard (1976) have advocated an iterative process for setting 
cutoff scores in testing programs such as high school graduation 
or minimum competency testing. 

As in Section 2, let F M (.) denote the cumulative distribution 

N 

function of the unit normal variable. Given the loss ratio Q, the 
mastery score c Q is given by the equation 



F Nhv e o>/(P-P V )i-l?C 



i-KT 

Alternately the selection of : as the cutoff score would indicate 
that the weights (or losses) accorded to a false negative eriur aM 
to a false positive error are in the ratio of Q to 1 where 

Q - 1/F, 



N 



(pc o -e o )/(p-p 2 +a 2 ) ,s j - 1. 



Q will degenerate to 0 when c q goes to +°° (i.e., when all subjects 
are denied mastery) and to • when c q goes to - 00 (i.e., when mastery 
is granted regardless of test score). 

5. SUMMARY AND CONCLUSION 

This study touches some aspects of the determination of pass- 
ing scores on the basis of th* bivariate normal test luodel. The 
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TABLE 1 

Basic Data for Numerical Example 2 



Raw Test Score 





i 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Frequency of examinees 


i 


4 


10 


21 


16 


23 


21 


16 


8 


5 


Frequency of referral- 
successful examinees 


0 


0 


1 


3 


4 


8 


15 


10 


7 


5 


Unadjusted p(x) 


0 


0 


.100 


.143 


.250 


.348 


.714 


.625 


.875 


1 


Pool-Adjacent-Violator- 
Adjusted p(x) 


.067 


.067 


.067 


.143 


.250 


.348 


.676 


.676 


.923 


.923 


«*> 


-1.450 


-1.150 


-1.450 


-1.067 


-.675 


-.391 


.457 


.457 


1.426 


1.426 
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loss ratio associated with classification errors is assumed to be 
constant, and the referral success function is assumed to be in the 
normal ogive family. Alternately, the model also provides a fairly 
simple way to assess the loss consequences associated with each 
mastery score* Such information is deemed useful to the test user 
who may wish to examine those consequences before making a final 
choice of cutoff score* 

It should be mentioned that the paper deals with group test 
data for a population of examinees. Thus the various results 
would be useful to the extent that loss consequences are considered 
jointly for the entire population* A procedure for setting passing 
scores on tests in the absence of group data is discussed elsewhere 
(Huynh, 1978; also in press)* 
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ABSTRACT 

A general framework for making mastery /nonmastery decisions 
based on multivariate test data is described in this study. Over 
all, mastery is granted (or denied) if the posterior expected loss 
associated with such action is smaller than the one incurred by the 
denial (or grant) of maater s. An explicit form for the cutting 
contour which separates mastery and nonmastery states in the test 
score space is given for multivariate ' »st scores which follow a * 
normal distribution with a constant loss ratio. For the case 
involving multiple cutting scores in the true ability space, the 
test score cutting contour will resemble the boundary defined by 
multiple test cutting scores when the test reliabilities are reason- 
ably close to unity. For tests with low reliabilities, decisions 
may very well be based simply on a suitably chosen composite score 

1. INTRODUCTION 

Application of mental measurement to selection or certification 
problems often involves the use of more than one test score. For 

This paper has been distributed separately as RM 79-7 , December, 
1979. 
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example, the selection of students for an advanced program In some 
subject area may be based on several traits (variables), such as 
prior achievement, aptitude, Interest, etc. Ideally, selection 
should be based on the subject's true measures on these traits; In 
reality, however, decisions are typically based on observed test 
scores which are contaminated with errors of measurement. Tijs, 
mlsclasslf Icatlons are bound to occur, and rules for decisions 
based on test data are typically formulated In such a way as to 
minimize the risks Incurred by mlsclasslf Icatlon. 

Decision problems based on one variable have been considered 
at length In the literature. Statistical Issues Involved In estab- 
lishing a single cutoff (cutting, passing, or mastery) score are 
described In detail in a number r.f sources including Swaminathan, 
Hambleton, and Algina (1975); Huynh (1976, 1977, 1979, 1980); 
Wilcox (1976) ; and van der Linden and Mellenbergh (1977) . Huynh 
(1979, 1980) also provides an explicit relationship among test 
cutting score, losses incurred by mlsclasslf icatlon, and errors of 
measurement. In general, within the minimax or empirical Bayes 
decision framework, it is found that errors in measurement will 
reduce the test cutting score when a false negative error is more 
serious than a false positive error. Conversely, the test cutting 
score will increase when a false negative error is less serious 
than a false positive error. 

The effect of errors of measurement in selection situations 
involving multiple true cutting sec es has been considered by Lord 
(1962) . The selection framework used involves the regression line 
expressing the amount of "desirability" assignel to different 
examinees as a function of the observed test scores. Using the 
multivariate normil distribution to describe the true and observed 
scores, Lord was able to plot the contour line in the observed 
test score plane which separates the subjects deemed acceptable 
(masters) from those judged as unacceptable (noMaasters) . Lord's 
paper, however, does not appear to come naturally from decision 
theory as formulated by Wald (1950) or as prescribed in Ferguson 
(1967). 
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The purpose of this paper is twofold. First it will describe 
a general empirical Bayes solution to the "plotting" of a cutting 
contour in selection situations involving multiple test scores. 
Second, it will explore the influence of the loss ratio on the 
cutting contour and will reexamine the distortion caused by errors 
of measurement (Lord, 1962), using an empirical Bayes decision- 
theoretic framework. Examples based on the multivariat • normal 
distribution with constant losses for misdecisions are provided to 
illuminate various points or procedures put forward in the paper. 

2. EMPIRICAL BAYES APPROACH TO CUTTING CONTOUR 

Now let the vector 6 = (9^ . . . , e k ) 1 denote the true scores 
(measures) of an individual subject on k traits (or selection 
variables). Let Q represent the region in the true score space 
where a subject must be located in order to qualify for the true 

state of mastery. Thus a subject is defined as a true master if 

c 

6 e ft. Let be the complement of U. Then a subject is declared 
a true noranaster when 9 c ft c . 

Now let the vector x = (x 1 ,x 2 , . . . ,x^) 1 represent the observed 
test scores of the subject. On the basis of x and other prior 
information regarding 9, a decision may be made concerning the sub- 
ject: either to grant mastery (action aj) or to deny mastery 
(action a^) . When 9 e Q, the best course of action is a^, and no 
loss will be encountered. Similarly, action a^ is best when 9 c Q c . 
For other situations, classification errors occur. To be specific, 
the choice of action when 9 e U constitutes a false r^.gative 
error, whereas the selection of a^ when 9 c produce, a false 
positive error. 

Let C g (9) be the loss associated with a false negative error 
and C^(9) be the loss encountered bv a false positive error. Let 
p(9|x) be the posterior probability density of 9 given that the 
fc^st score vector y has been observed. Given x, the posterior 
expected loss encountered in taking action a^ is given by the 
integral R(a |x) -J C f (9)p(9 |x)d9. 
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Similarly, the posterior loss associated with the choice of 
action a 2 is R(a 2 |x) = C g (e)p(e |x)de. 

It follows from Bayes (or empirical Bayes) decision theory as 
expressed, for example, in Ferguson (1967) that, in the test score 
space generated by the test score vector x, the cutting contour S 
separating the two actions a 1 (granting mastery) and a 2 (denying 
mastery) is defined by the equality R(ajx) = R(a 2 |x). In other 
words, the line (or surface) S consists of all points x at which 

I c s (e)p(e|x)de - J c f (e) P (e|x)d e . (i) 

The following section explores in detail the implications of 
Equation (1) for the case involving constant losses and multiple 
true cutting scores. 

3. CUTTING CONTOUR FOR CONSTANT LOSSES 
AND MULTIPLE TRUE CUTTING SCORES 

Let losses be constant and expressed as C^(e) = 1 ?,nd C (e) *= Q 
in the region where they do not vanish. In other words, Q is the 
ratio of the false negative loss to the false positive loss. In 
addition, let ft be the "upper *ight" corner defined by the true 
cutting scores 9 2 > • • • » 9 k « In other words, 
* * * 

ca x {e;e 1 ^e 1 >e 2 ^e 2 »«« •>e fc <0 k ). 

With constant losses Equation (1) may now be written as 

Qj p(e|x)de - J c P (e|x)de. 

c Q 

Since q u q spans the entire space for e, it follows that 

J p(e|x)de + / c p(e|x)de * 1. 

Q 

With this relationship, Equation (1) becomes 

/ 0 P<e|*-oo- (2) 

which may be written, using the given multiple true cutting score j, 

as 

pr(ej<e lf e^<e 2 e*<e k | x ) - ^ (3) 

The line consisting of the points of coordinate x which satisfy 
Equation (2) or (3) defines the boundary between granting and deny- 
ing mastery ir the test score space. This boundary line will be 
referred to as a cutting contour . 
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4. CUTTING CONTOUR IN MULTIVARIATE NORMAL TEST SCORES 

For illustrative purposes, let it be assumed that the true 
score vector 0 for a population of subjects follows a multivr-iate 

normal distribution with mean vector y - (p 1 ,p 2 \) ' and with 

covariance matrix Z Q - fay)- In the term „ 'ogy of empirical 
Bayes statistics, this statement is equivalent to the requirement 
that the prior distribution cf the true score vector 6 be the same 
for all subjects in the population under study. This common prior 
distribution may be estimated from historical test score data or by 
procedures which are consistent with classical measurement theory 
and practice. 

The difference vector e - x - 6 represents the errors of 
measurement, it will Le assumed that the k omponents of e are 
normally and independently distributed, each with a mean of zero 
and a variance of e^, i « 1,2,..., k, free of 9. In addition, it 
will be assumed that the two vectors e and 9 are stochastically 
independent. To simplify the notation, let Z q be the diagonal 
matrix with elements e^. 

It follows from classical measurement theory and from known 
properties of multivariate normal <H tributlons that the joint 
distribution of x and 9 is multivariate normal with a mean vector 
of p for both x and 9 and with a covariance matrix defined as 

x | 9 

"9 1 *9 

where - Z Q + 1^. Hence the posterior distribution of 9 given 
the test score x is multivariate normal with mean vector £(x) - 
(Cjt £ 2 » ' ' ' » ' ~ y + (x-pVE^ 1 and with covariance matrix 

A * } " Z 9 ~ S 9 Z x lj: 9' The vector £( x > is a function of the 
test score vector x. On the other hand, the matrix A is free of x. 
Now let us consider the standardized variables y. ,y 9 , . . . ,y 

where 

y<- [9 ± ~ ^(x)]//^, i - 1,2, ...,k. 
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Each of these variables has zero mean and unit variance. Let T be 
the correlation matrix associated with A (i.e., r is the covariance 
matrix of the variables). In addition, let 

y i " ( 8 i " ^(x))//^, i - 1,2, ...,k. (4) 

Then the cutting contour separating the two actions a^ and a2 in 
the test score space is defined by the equality 

Pr(y*<y 1 ,y^ 2 ,...,yJ^ k ) - (5) 

where the random vector y ■ ^\*^2 ^k^ ' ^°^ ows a multivariate 
normal distribution with zero means, unit variances, and correlation 
matrix r free of x. 

Consider now the set y consisting of the points with coordi- 
nates (y^y^ which satisfy Equation (5). Tihansky (1970) 

refers to this set as an equidistributional contour and provides 
ways to construct contours of this type for bivariate normal dis- 
tributions. The contour y depends only on r which does not involve 
the observed test score vector x. Once it has been constructed, 
the cutting contour C in the test score space may be plotted via 
the system of linear equations represented by 

u+ (x-nV^lf 1 - 5, (6) 

where 

* ^ . 

a 6^ " y±™±±> * m l»2,...,k. 

Where computer facilities are available, equidistributional 
contours may be dravn via the Newton-Raphson iteration process for 
nonlinear equations. For example, let (y^^)' f°H° w a standard- 
ized bivariate normal distribution with correlation p. Let a be 
any number between 0 and 1, and u be such that Pr(u <: y^) < a. We 
will search for the value v at which G(v) * 0, where 

G(v) » Pr(yj>u,y 2 >y) - a, 

" Pr(y 1 £-u,y 2 <-v) -a. (7) 

The derivative of G(v) with respect to v is given as 
. 2 

G'(v) - -(2ir)~*exp (- \) P(y 1 <-u|y 2 - -v) . (8) 
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Here the conditional distribution of yj . given y £ - -v is a normal 
distribution with a mean of -pv and a standard deviation of (l-p 2 )* 4 
Hence 

C« (v) - -(2.)-^«p (- ip ( Z < -^H±fiv_) (9) 

(l-p 2 )* 

where Z is the standardized normal variable. The values of G(v) 
and G' (v) may be obtained via computer programs such as MDBNOR 
(IMSL, 1977) and the Fortran IV library function ERFC. Both G(v) 
and G'(v) are needed in the Newton-Raphson iteration process. This 
procedure has been found to converge when u is not too close to the 
upper bound u q at which P(u q < y^ - «. (It may be noted that the 
bivariate equidistributional contour has two asymptotes defined as 
u - u Q and v - Uq . Thus small variations in a u value near u will 
tend to associate with substantial changes in the v values; because 
of this, the iteration process may fail. However, since P^ > u , 
y 2 — v ^ " P ( y i 2 v »y 2 2 u )» the contour is symmetric with respect 
to the first diagonal in the (u,v) -plane. Thus it is necessary to 
iterate the v value for each u sufficiently smaller than the upper 
bound u o , and then to resort to symmetry to complete the drawing of 
the contour.) 

The drawing of an equidistributional contour for any k-variate 
normal distribution may be accomplished in the same way via tht 
Newton-Raphson iteration process previously described. The details 
are straightforward and therefore are not presented here. Multi- 
variate normal probabilities of the form P(y* < j y* < y 2 

y k - y k* my be evaluated via computer programs such as the one 
described in Milton (1972) . 

It may be noted that the contour y does not depend on the two 
vectors 6 and p. In addition, in the transformation from y to C 
as defined by (6) , these two vectors act only to indicate the new 
locatior of the transformed curve. It fellows that the abape of 
the cutting contour C does not depend on either the vector p or the 
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5, AN ILLUSTRATION OF CUTTING CONTOUR 

Consider now a selection based on two variables defined by the 
true scores & l and 6 2> and by the observed test data Xj^ and x 2 . It 
will be assumed, as in Lord (1962), that both x x and x 2 are in their 
standardized form and have a common reliability coefficient of .90. 
In addition, let the correlation between x, and x 2 be .60. It fol- 
lows that the matrices E and E a are defined as 
1.00 .60 

.60 1.00 



and 



.90 .60 1 
.60 .90 



With 



x .64 



1.00 -.60 
-.60 1.00 



It follows that 



Vx 



1 

.64 



.54 .06 
.06 .54 



.84375 .09375 
.09375 .84375 



.90 .60' 


1 


'.522 


.378' 




.084375 


, 009375 


.60 .90 


" .64 


,.378 


.522, 




.009375 


.084375 



and 



Thus the posterior distribution of 6 - (e^e^' given the test data 
x = (x 1> x 2 ) 1 is bivariate normal with mean vector C(x) - (S^^)' 
where ^ - .84375 Xl + .09375x 2 and £ 2 » .09375 Xl + .84375x 2 . The 
posterior standard deviations are (.084375)* 5 - .29047 for both e x 
and 8 2> and the posterior correlation between and 9 2 is 
.00J375/. 084375 - .11111. 

It may then be deduced from the equations represented by (4) 

fiat 

yj - (6* - (.84375 Xl + .09375x 2 ))/ .29047 

and 

y* 2 " t e J - (.09375 Xl + .84375x 2 ))/. 29047. 
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To draw the (x^Xj) contour line, let us suppose that 8* ■ e£ 

- 0. The two equations represented by (6) can be written as 

.84375 Xl + .09375x 2 - -.29047y* 

.09375x 1 + .84375x 2 - -.29047y* 
or equivalently 

x 1 « -.34857y* + .03873y* 

x 2 - .03873y* - .34857yJ . 

In the above equations, the point at coordinate (y*,y 2 ) belongs 
to the equidistributional contour line defined by P(y* < y^y^ < y 2 ) 

- 1/(1-KJ), where (y^y^ 1 follows a standardized bivariate normal 
distribution with correlation .11111. It may be recalled that Q is 
the ratio of the false negative error loss to the false positive 
error lose. 

For purposes of illustration, the sf:eps previously described 
were implemented in drawing the cutting contours associated with 
the loss ratios Q - 1/3, 1, and 4. These contours are depicted in 
Figure I. 

6. EFFECT OF LOSS RATIO ON CUTTING CONTOU R 

In Figure I, the upper right region bounded by each cutting 
contour consists of the test score points at which mastery is 
granted. It may be observed that the mastery region expands as the 
loss ratio Q increases. This conclusion is to be expected. If the 
consequences due to a false negative error become more serious (i.e., 
Q increases), then the classification (or selection) procedure 
should be so designed as to reduce the probability of this error. 
Thus the size of the nonmastery set must be reduced, and as a 
consequence, it becomes more likely that mastery will be granted. 

In general, let the set A (Q.) con dt of all points y* * 
(y 1 »y 2 >»*»»yj c ) for which 

p (yJ < ypyj £ y 2 » < y k ) > i/d+Q^ (io) 
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and let A(Q^) be the corresponding region in the test score space. 
It may be verified that in ACQ^ the expected posterior losses 
associated with the two actions a^^ (granting mastery) and (deny- 
ing mastery) satisfy the inequality R(a 1 |x) < R(a 2 |x). Thus the 
set A(Qj^) consists of test score points at which the subject is 
declared a master. Now let Q 2 be a second loss ratio such that 
Q x < Q 2 . This is equivalent to 1/<14Q ) > l/(l+Q 2 >. Let A(Q 2 ) have 
the same meaning as above. Then any test score points which belong 
to A(Q 1 ) must also belong to A(Q 2 ) . In other words, the inequality 
Q x < Q 2 implies that HQj) <= A(Q 2 ). Thus, as the loss ratio Q 
increases, the mastery region in the test score space will expand. 
By the same line of reasoning, when Q decreases, the mastery region 
will be reduced in size. 

7. EFFECT Or ERRORS OF MEASUREMENT ON CUrTING CONTOUR 

To illustrate the effect of errors of measurement on the cut- 
ting contour in the test score space, let it be assumed as in the 
previous section that the test scores x^ and x 2 are in their 
standardized forms and have a correlation of .60. In addition, let 
it be assumed that x- and x- are equally reliable with common relia- 
bility coefficient p, and that 6 1 ■ 6 2 ■ 0. 

It rollows from the equations represented by (6) that 

1.25(p-.36)x + .75(l-p)x 9 « (p 2 -1.36p+.36) V 

1 1 1 (11) 

.75(l-p) Xl + 1.25(p-.36)x 2 - (q 2 -1.36q+.36)%* . 

In these expressions, the point (y^y^) belongs to an appropriate 
equidistributional contour associited with the standardized bi- 
variate normal distribution with correlation 6 » .6(l-p)/ (p-. 36) . 

It may be deduced from the positive semidef initeness of the 
covariance matrix of (6^ 9 B^) that the common reliability p must be 
between .60 and 1.00. As a function of p, the posterior correla- 
tion 6 is a decreasing function, assuming the value of 1.00 when 
p «- .60 and having the limit of 0 when p tends to 1.00. 
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When p approaches the upper Unit 1.00, the posterior distri- 
bution of (fl lt 6 2 ) will degenerate at the point (x-.Xj). (It may 
be noted that when p - 1, the posterior covariance matrix A as 
defined in Section 4, i.e., Z Q - E^ 1 !,, will vanish.) Given the 
test score vector x = (x^x^', formally, the posterior expected 
loss for taking action &v |x) , is equal to 0 when x e 0 and 

1 when x e n c . Similarly, R(a |x) is equal to Q when x e 0 and 
0 when x c to . Thus, mastery is granted when x 1 > 0 and x 2 > 0. 
When either < 0 or x ? < 0 (or both), mastery is denied. In 
summary, when p tends to unity, the cutting contour line in the. 
test score space will approach the cutting contour line defined 
in the true score space. 

Consider now the other limiting situation where p tends to .60 

and 6 goes to 1.00. The entire bivariate probability of (x^x^ 1 

is now concentrated on the diagonal x 1 - x 2 - Let y Q be tne point 

at which P(y Q < yj ) » 1/(1-KJ) where y^ as previously defined, is a 

standardized normal variable. The equidistributioiUil contour line 

is now comprised of the two half lines defined by (i) y* « y and 
* * * J J l J o 

y 2 - V and y 2 - y o and y l - y V Both half llnes start at 

the point (y o »y Q ) and extend to 0 ne vertically and the other 
horizontally. 

The equations (11) now become 

x 1 + x 2 = -,32y* 

X l + X 2 = "- 32y 2 • 
It follows that the cutting contour in the observed test score 
space is the straight line defined by the equation x 1 + x 2 « -.32y 
The decision regarding granting or denying mastery in this case is 
actually based on the composite score + x 2 although separate 
cutting scores have been set in the true score space! 

For purposes of illustration, cutting contours are drawn for - 
the reliability coefficients of p = .95, .80, and .65, and with the 
loss ratio Q - 1. The contours are uhown in Figure II. 
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8. SUMMARY 

A general framework for making mastery/nonmastery decisions 
based on multivariate test data is described in this study. Over 
all, mastery is granted (or denied) if the posterior expected loss 
associated with such action is smaller than the one incurred by che 
denial (or grant) of mastery. An explicit form for the cutting 
contour which separates mastery and nonmastery states in the test 
score space is given for multivariate test scores which follow a 
normal distribution with a constant loss ratio. 

For the case involving multiple cutting scores in the true 
ability space, the test score cutting contour will resemble the 
boundary defined by multiple test cutting scores when the test 
reliabilities are reasonably close to unity. For tests with low 
reliabilities, decisions may very well be based simply on a suitably 
chosen composite score. 
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ABSTRACT 

Two versions of the Nedelsky procedure for setting minimum 
passing scores are compared* Two groups of judges, one using each 
version, set passing scores for a classroom test* Comparisons of 
the resulting sets of passing scores are made on the basis of (1) 
the raw distributions of passing scores, (2) the consistency of 
pass-fail decisions between the two versions 7 and (3) the con- 
sistency of pass-fail decisions between each version and the pass- 
ing score established by the test designer* The two versions of 
the procedure are found to produce essentially equivalent results* 
In addition, a significant relationship is observed between the 
passing score set by a judge and that judge's level of achievement 
in the content area of the test* 

This paper has been distributed separately as RM 80-1, March, 1980. 
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1 . INTRODUCTION 

Passing scores are needed in a broad variety of situations, 
including (a) entrance examinations, (b) tests for advancement of 
students from unit to unit in individually preoribed instruc- 
tional programs, (c) minimum competency testing, and (d) certifi- 
cation or licensing examinations. Though writers such as Glass 
(1978) charge that passing scores for minimum competency testing 
are usually selected arbitrarily and frequently used unwisely, 
others (Hambleton, 1978; shepard, 1976) have documented the need 
for cutoff scores in such areas as objectives-based programs and 
individualized instruction. This paper presumes the practical 
necessity of passing scores and explores ways in which they can 
be established more objectively. 

Procedures for Setting Passing Scores 

Various procedures for setting passing scores or "standards" 
have been developed (see Meskauskas, 1976). Most can be placed 
into one of three broad categories: (a) comparisons with the per- 
formance of others, (b) considerations of the consequences of 
misclassification, and (c) examinations of item content. 
Standard-setting procedures in the first two categories generally 
require actual student response data or assume a theoretical, 
statistical distribution of such data; content-based methods use 
judgements of content experts. Content-based methods frequently 
are used with tests when student performance data are not avail- 
able. 

Methods for determining passing scores by analyzing test con- 
tent require a judge or group of judges to estimate the probable 
score of a hypothetical examinee responding at the level of mini- 
mum acceptable performance. Three of the best-known content-based 
procedures are those proposed by Angoff (1971), Ebel (1972), and 
Nedelsky (1954). In using the Angoff method, each judge estimates, 
the probability that the "minimally acceptable person" would 
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respond correctly to each item; the passing score is determine.' ,y 
summing the estimated item probabilities (Angoff, 1971; Zieky and 
Livingston, 1977). In the Ebel procedure, judges sort items into 
categories of "relevance" and "difficulty." Each judge then esti- 
mates the proportion of correct answers in each category expected 
of a "minimally qualified" examinee. The passing score is the 
weighted sum of these proportions, with the weight for each cate- 
gory being the number of items it contains (Ebel, 1972). The 
Nedelsky method is restricted to multiple-choice tests. Every re- 
sponse option is considered by each judge, who decides which op- 
tions could be rejected as incorrect by an examinee performing at 
the minimum passing level. The probability that someone at this 
level would respond correctly to the item is t^'-en to be thz re- 
ciprocal of the number of remaining options (i.e., one divided by 
the number of options that the minimally performing examinee 
should not be able to reject). The passing score is the sum of 
these reciprocals for all items. (In the original formulation, 
Nedelsky (1954) offers further refinements, such as, estimating 
the standard deviation of the chance distribution of scores and 
using it in conjunction with setting the passing score. These 
refinements are not considered in this paper.) In all cases, the 
passing score can be expressed as a fraction or percentage of the 
total number of items. 

Comparisons of the A pplication of the Method 

The metnods discussed above, though operationally quite 
different, have strong logical similarities. It might seem that 
they could be expected to produce equivalent passing scores. Re- 
search reported in the literature indicates that this equivalence 
is not always observed. In a study comparing the Ebel and Nedelsky 
procedures, Andrew and Hecht (1976) found that the two standard- 
setting methods produced significantly different passing scores. 
Perhaps an even more important consideration was that 45 percent 



109 



ion 



SAUNDERS, RYAN, & HUYNH 



of the examinees being tested were classified differently by the 
two passing scores (Glass, 1978). In research utilizing the 
Nedelsky and Angoff procedures, Brennan and Lockwood (1979) also 
reported a substantial difference in the resulting passing scores. 

When several judges are used, the variation among judges 1 
individual passing scores also can become an issue, A certain 
degree of variation might be expected. It is usually suggested 
that the different passing scores be reconciled either by 
averaging the scores or by requiring judges to reach a consensus 
passing score. Andrew and Hecht (1976) found that passing scores 
obtained by consensus and by averaging did not differ significantly. 
In at least one reported case, however, the amount of variation 
among passing scores set by a group of judges using the Nedelsky 
procedure was substantial, and the procedure was rejected as un- 
feasible (Meskauskas and Webster, 1975). The averaging process 
treats the variation in passing scores as random or "error" varia- 
tion. It might be, however, that differences in passing scores 
are related systematically to characteristics of the judges. If 
passing scores are to be useful, they should not depend too much 
on the characteristics of a particular judge or group of judges. 
Such characteristics, once identified, possibly could be con- 
trolled to prevent them from exerting an undue influence on the 
standard-setting process. One characteristic which intuitively 
might be expected to show such a relationship is tha judge's own 
1 vel of achievement in the relevant area. 

Focus of this Paper 

This paper deals only with the Nedelsky procedure. Two ver- 
sions of the procedure appear to be in use. In the first version, 
judges must classify response options into two categories: (a) 
thos<i which should be rejected as incorrect by the minimally per- 
forming examinee, and (b) those which should not. In the alter- 
native version, a third category, "undecided," also is used when 
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the judge is unable to classify the response option as one that 
either should or should not be rejected. Decisions between the 
two versions seem to be based on the preferences of the judges, 
rather than any theoretical consideration (e.g., Paiva and Vu, 
1979; Smilansky and Guerin, 1976). Nedelsky (1954) discussed the 
use of the altenative procedure; he apparently felt the two ver- 
sions were equivalent. 

The purpose of this paper is twofold. First, a comparison 
is made between the two versions of the Nedelsky procedure. 
Second, the relationship between the achievement levels of judges 
and the passing scores they set will be assessed. 

2. METHOD 

Subjects 

In order to compare the two versions of the Nedelsky pro- 
cedure, subjects acting as judges were divided into two groups. 
Group A used the two-category version of the procedure to set 
passing scores on an achievement test, while Group B used the 
three-category version. The results were compared using the dis- 
tributions of passing scores, as well as the consistency of 
decisions based upon the scores. Also, to determine the relation- 
ship between judges' achievement and passing score, the correlation 
between measures of the two variaoles was calculated. 

Data for the study were obtained from students in an intro- 
ductory course in educational research and measurement. The course 
was conducted via videotape at a number of regional campuses of a 
large state university. All subjects were graduate students; many 
were experienced teachers. 

Instrument 

The instrument for which passing scores were set, and by 
which judges 1 achievement levels were determined, was the course 
midterm examination, a 40-iter, four-option, multiple-choice test, 
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constructed by the course instructor (the second author). The 
test covered such topics as the nature of the research process, 
observation and measurement, sampling, and item analysis. The 
exam has been revised over several years to reach a high degree 
of content validity, and in its most recent administration showed 
an interna] consistency (KR20) reliability index of .82. Thus, 
scores on the test are taken to be valid and reliable measures 
of achievement. 

Treatment Groups 

All students enrolled in the course wrote the midterm exam- 
ination as a regular course requirement. The exams routinely were 
graded and returned to the students for discussion in class. The 
students then were asked to participate in an exercise involving 
the use of the Nedelsky procedure to determine a passing score for 
the test. While participation in the exercise was voluntary, more 
than 95% of the students chose to participate. Of the 148 students 
agreexng to participate, 30 were deleted from the study due to 
failure to follow instructions, missing identification codes, or 
missing achievement data, leaving 118 students as the sample used 
in the experiment. Subjects were assigned randomly to groups, 
stratified by course section to control for possible differences 
among regional campuses. Then they were given copies of the test, 
along wltn detailed instructions on the Nedelsky procedure. In- 
structions for the two groups differed only with respect to the 
version of the procedure used. 

Definition of Minimum Competence 

Minimum acceptable performance was defined for the subjects 
as the lowest level of performance on the test for which a grade 
of "B" would be awarded. This level was chosen as appropriate, 
since one of the requirements of the subjects 9 degree programs is 
that a "B" average be maintained. For each incorrect response 
option on the test, the subjects were instructed to respond to the 
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question "Should the student performing at the minimum acceptable 
level (as defined above) be able to reject this option as 
incorrect?" Spaces were provided for that purpose beside each 
option. For the two-category version (Group A) of the procedure, 
the possible responses were "yes" and "no." The three-category 
version (Group B) also allowed "undecided" as a possible choice. 
In order to minimize any possible confounding effect produced by 
the subjects' knowledge of previously existing course standards, 
the subjects were not required to calculate their resulting 
Nedelsky passing scores; this was done by the authors. Each sub- 
ject responded individually; no attempt was made to determine con- 
sensus passing scores. 

Comparison Procedures 

The frequency distributions of passing scores produced by 
the two groups were compared using the Kolmogorov-Smirnov two- 
sample test, a broad test sensitive to any difference in the two 
distributions. The distributions of passing scores are given in 
Table 1. All passing scores were rounded upward to the nearest 
whole number, that is, the number of correctly-answered items 
necessary for an examinee to be classified as passing. Decision 
consistency was assessed via comparisons of the proportions of 
students writing the exam who were classified similarly by the two 
versions. Both the mean and median passing scores for each group 
were used in the comparisons. The results are shown in Table 2. 
Also, decisions based on the groups 1 passing scores were compared 
with those based on the standard established by the course in- 
structor, as shown in Table 3. Finally, to assess the relation- 
ship between judges 1 achievement and passing score, the Pearson 
product-moment correlation coefficient was determined foi the 
subjects' examination grades and their Nedelsky passing scores. 
For this calculation, the two groups were combined. 
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TABLE 1 

distributions of Passing Scores from Two Versions 
of the Nedelsky Procedure 



Passirg 


Frea; 


iency 


Passing 


Frequency 


Score 


Group A 


Group B 


Score 


Group A 


GrouD B 


13 


o 


1 




2 


4 


14 


1 


0 


27 




U 


15 


0 


0 


28 


5 


2 


16 


2 


i 


29 


4 


4 


17 


0 


1 


30 


0 


1 


18 


1 


0 


31 


3 


5 


19 


0 


0 


32 


5 


3 


20 


3 


1 


33 


2 


3 


21 


1 


0 


34 


6 


10 


22 


1 


0 


35 


6 


5 


23 


2 


2 


36 


3 


2 


24 


2 


4 


37 


3 


5 


25 


1 


2 


38 


5 


3 




N MEAN 


MEDIAN 


S.D. 






Group A 


59 29.88 


31.17 


6.38 






Group B 


59 30.51 


31.37 


5.79 






Kolmogorov-Smirnov D 


= .170 (p - 


.36) 







3, RESULTS 

The overall passing score distributions for the two groups, 
displayed in Table 1, showed no significant difference (p « .36). 
As can be seen in Table 2, the two forms also produced highly 
consistent classification decisions. If the mean passing score 
for oach group is used as a standard, only 7 of 185 students taking 
the test would have been classified differently, a percentage of 
agreement of 96%. The exact median passing scores from the two 
groups ate 31.17 and 31.37, respectively. Rounding upward, both 
these valves become 32. Thus, use of the median passing score 
produced the surprising result of complete agreement in classifi- 
cation. 
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The fact that the two versions produce passing scores ■ *lding 
consistent decisions does not, in itself, mean that the scores are 
useful in practice. But further comparisons of decisions based on 
the Nedelsky passing scores with those based on standards previous- 
ly established by the course instructor (32 correct answers for a 
grade of B) also show a high degree of agreement (Table 3). Using 
the group mean passing score as the standard, 11 of 185 students 
were classified differently by Group A (the two-category version) 
and the course instructor's pre-set standard (percentage agreement 
■ 94%). For Group B (the three-category versions), this percentage 
was 98% (7 students classified differently). The group medians, 
rounded up to 32, coincide exactly with the course instructor's 
standard. Here again, use of the group medians produced ( omplete 
agreement . 

As was noted previously, subjects in both groups were com- 
bined to consider the relationship between judges' achievement and 
passing score. Such a relationship, if it exists, might be expect- 
ed to hold across methods; in any event, the demonstrated equiva- 
lence of the two forms suggests the reasonableness of combining the 
two groups. The linear correlation between achievement and passing 
score for the subjects of the study was .30 (p « .001). Thus 
achievement in the subject matter area accounted for 9% of the ob- 
served variation in passing scores. 

4. DISCUSSION 

From the results of this study, the two- and three-category 
versions of the Nedelsky procedure yield equivalent results. 
The finding holds both in terms of the empirical distributions of 
passing scores, and of consistency in classification decisions. 
Additionally, there was a close correspondence both in distribu- 
tions of passing scores and in classification decisions between 
passing scores set by the subjects and the pre-set standard es- 
tablished by the course instructor. 
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TABLE 2 

Decision Consistency of Passing Scores 
Two Versions of the Nedelsky Procedure 



Case It Using the mean of several judges . 



Group B 





fail 


pass 




fail 


44 


7 


51 


pass 


0 


134 


134 




44 


141 


185 



Proportion of consistent decisions « * = ,96 

185 



Case lit Using the median of several judg es. 



Group B 





fail 


pass 




fail 


55 


0 


55 


pass 


0 


134 


134 




55 


134 


185 



Proportion of consistent decisions = 13 * * c 55 - 1.00 

185 
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While either the mean or median of several judges' passing scores 
could be used to set the final passing standards the median, rather 
than the mean, might be more appropriate. The median's resistance 
to the influence of extreme scores would seem to reduce some of the 
effect of variability in passing scores from a group of judges. 

Some variation wac observed in the scores from both groups of 
judges. The slightly smaller standard deviation of passing scores 
from Group B, using the three-category version of the procedure, 
might be a point in favor of the use of that version. The signi- 
ficant positive correlation between judges' achievement and pass- 
ing score indicates that at least a small portion of the observed 
variation in passing scores was related systematically to a 
characteristic of the judges. Other relevant characteristics might 
be identified which also relate systematically to judges' passing 
scores. Knowledge of these characteristics and their relationship 
to passing scores could lead to their elimination, control, or 
utilization in the standard-setting process. This knowledge would 
make the setting of passing scores on the basis of expert judgement 
a more objective process. 

In conclusion, this Study has shown that the two versions of 
the Nedelsky procedure considered here produce equivalent passing 
scores. Also, it was shown that the passing scores set by differ- 
ent judges were related positively to the judges' own achievement. 
It should be noted that the study involved the setting of passing 
scores for a single test, using as judges students who took the 
test but who were not responsible for constructing it. Further, 
such judges are not likely to have the broad knowledge of other 
students, of how such tested content fits into the total curri- 
culum, and of the subject-matter itself which, say, faculty 
members might have. It is an open question whether faculty 
members would tend to show the same pattern of consistency in 
applying the two Nedelsky methods. Thus the observed results must 
be seen as suggestive rather than conclusive. However, given the 
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TABLE 3 



Decision Consistency of Course Instructors Standard with 
Passing Scores from Two Versions of the Nedelsky Procedure 



Case I: Using the mean of several judges . 

Group A 
fail pass 



Instructor's 

Pre-set 

Standard 



fail 



pass 



44 


11 


0 


130 



55 



130 



44 141 185 
Proportions of consistent decisions ■ 

130 + 44 _ ol 
185 

Case II: Using the median of several judges . 

Group A 
fail pass 



Instructor's fail 



Pre-set 
Standard 



pass 



55 


0 


0 


130 



55 



130 



55 130 185 
Proportions of consistent decisions = 

130 + 55 



185 



1.00 



fail 


pass 


51 


4 


0 


130 


51 


134 


130 + 51 _ 


185 


Group B 


fail 


pass 


55 


0 


0 


130 


55 


130 


130 


+ 55 . 



55 



130 



55 



130 



185 
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results of this study, a choice between the two versions justifi- 
ably could be made on practical grounds, such as the preference of 
the judges, 
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ABSTRACT 

A general model along with four illustrations is presented for 
the consideration of budgetary constraints in the setting of passing 
scores in instructional programs involving remedial action for poor 
test performers. Budgetary constraints normally put an upper limit 
on any choice of passing score. Given relevant information, this 
limit may be determined. Alternately, ways to assess the budgetary 
consequences associated with a given passing score are provided. 
Such information would be useful in any final decision regarding the 
passing score. 



1. INTRODUCTION 

In many instructional programs, such as Individually Prescribed 
Instruction (Glaser, 1968) or others of a similar nature (Atkinson, 
1968; Flanagan, 1967), testing is conducted at the end of every 
instructional unit to provide feedback to the student and/or teacher 
in order that appropriate action can be taken. If a student's test 
score is high, it may be reasonable to grant that student mastery 

This paper has been distributed separately as JW 79-3, tpril, 1979 % 
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of the current unit and to allow him to proceed to a subsequent 
unit. On the other hand, a low score may indicate that the student 
might benefit from some remedial action. This is also the case for 
certification testing such as high school graduation or for minimum 
competency testing as legislated in several strfes. Funds are 
usually allocated for remediation for students whose scores are too 
low to warrant mastery of the competencies under consideration. 

The statistical issues relating to granting or denying mastery 
status have been approached by several writers, including Huynh 
(1976, 1977, 1978). Most proposed schemes are by and large quota- 
free, i.e., the mastery/rionmastery decision process considered by 
the writers does not take into account the budgetary consequences 
associated with the denial of mastery status. If funds provided 
for remediation are limited , then a conotraint will have to be 
imposed on the number of students declared as failures (nonmasters) . 

The purpose of this paper is to demonstrate how budgetary 
restrictions may be taken into account in the process of setting 
passing (mastery) scores or performance standards. Alternately, 
the presentation provides ways to assess the budgetary consequences 
associated with an arbitrary passing score. Section 2 describes 
the overall framework. Illustrations based on the beta-binomial 
and normal-normal test score models will be provided in subsequent 
sections. 

2. OVERALL FRAMEWORK 

It is now assumed that the true ability of a population of 
subjects may be described by a random variable 0 which ranges in 
the sample space ft. For the beta-binomial model, 6 ic the propor- 
tion of items that subject answers correctly in an item pool and 
fi is the interval -rom 0 to 1. For the norma' test score model, 0 
is the traditional true score (Lord & Novick, 1968) and ft is the 
entire real line. Let the probability density function (pdf) of 6 
be p(0). 
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Let x be the score obtained from the administration of an n- 
item test and let f(x) and f(x|e) denote its marginal and condi- 
tional probability density functions with respect to 6. 

It shall be assumed that all subjects with test scores smaller 
than a passing (mastery) score c will be denied mastery for the 
instructional objectives covered by the test and that these subjects 
will be provided with appropriate remedial learning activities. 
The remediation is assumed to be so devised that its conclusion 
will coincida with the mastery status which was previously denied 
the student. The cost of remediation will be assumed to be a non- 
increasing function of 9 and will be denoted as 6(6). Thus, 
remediation will cost less for more able students than it will for 
less able ones. 

Consider now a subject with true ability 9. The probability 

thaw this person will be declared in need of remediation is given 

as the sura Zf(x|9) or the integral / f(x|6)dx, with x < c. For the 

purposes of this section, the summation notation will be used. It 

follows that the (conditional) expected remediation cost for this 

subject is 

I f(x|9)6(9) . 
x<c 

Hence the (unconditional or marginal) expected remediation cost for 

a subject drawn randomly from the population is 

Y(c) = J fl Z f(x|9)6(9)p(9)d9. (1) 
x<c 

This function is nondecreasing with respect to its argument c. Its 
lowest limit is zero (when all subjects are granted mastery status) 
and its maximum value, Ymax - J Q 6(9)p(6)d9, is reached when 
remediation is provided to all subjects regardless of their test 
scores. 

Let us suppose, furthermore, that testing is to be conducted 

for a total of m subjects and the total cost of possible remediation 

cannot exceed the value B. If the passing score c is selected, then 

the total expected remediation cost will be my(c). Hence any choice 

for c must satisfy the budgetary constraint my(c) < B. If y < B, 

— max — 
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any cutoff score will be acceptable. However, if B < y , then 

max 

the passing score c must be less than or equal to c^ where Cj is 
the highest score satisfying the inequality 

y( Q l> 1 B / m - (2) 

For discrete test scores, such as those of the binomial error model, 
Inequation (2) may be solved by computing the values of y(c) one by 
one, starting with c as the smallest test score, and stopping when 
the value c^ is reached. For continuous test data, numerical pro- 
cedures for solving the nonlinear equation y^) = B/m might be 
needed. 

3. THE BETA-BINOMIAL MODEL WITH CONSTANT COSTS 

Consider now the beta-binomial model as defined by the follow- 
ing pdf's: 



f(x|6) - (^)0 x (l-0) n " x > x - 0,1, 

and 



*'> * mJ) ■ •«•«»• 

The two parameters a and 6 may be estimated from sample data via 
one of several estimation techniques such as the moment procedure 
or the maximum likelihood procedure. Let x and s be the sample 
test score mean and standard deviation. In addition, let <x^ be 
the KR21 reliability coefficient as defined by 

a 21 ~ n-i 1 2~T (3) 

v ns ' 

(In the case of a negative ot 21> simply replace the value computed 
from Equation (3) by any positive reliability estimate.) The moment 
estimates for a and 3 are given as 

a = (-1 + l/a^x (4) 
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B e -o + n/ot 21 - n. (5) 

We will now focus on the simple case where a single true pass- 
ing score (or criterion level) 0 q , separating true masters from 
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true nonmasters, has been specified. Let the remediation cost be 
constant and equal to y q for a true nonmaster and y 1 for a true 
master. Thus the cost function is of the form 

y if e < e 

6(9) - 0 0 



Yi if 9 > 9 , 
1 — o 



The nonincreasing nature of 6(9) is satisfied whenever y > 

The expected remediation cost per student as shown by Equa- 
tion (1) is now given as 

'<iU e'"*" 1 a-e> n * 6 - x - 1 <ie 



or 



+ * 0 C° e a+x - 1 d-e) Iri - 6 - x - 1 de) 

i c_1 r 

T(C) = BM) 1 0 Y 1 B(ort«,«ri.8-x) 

X"0 

e 



+ (y 0 ~Yj) J 0 ° e a4lt - 1 (i-e) lH " e - x - 1 de) . 

It may be noted that the marginal beta-binomial pdf of x is given as 

f(x) * (]J)B(Qf4oc,n+6-x)/B(a,e) (6) 

and that the incomplete beta function I(a-hc,n+B-x; 9 ) is defined as 

o 

9 

I(ct+x,n+e-x;9 o ) - j Q ° B flphE " 1 (l-e) lri " 8 " X ~ WB(a4x,iH-fJ-x) . 

It follows that 
c-1 

Y(c) = Z f(x)( Yl + ( Y -Y 1 )I(a4x,iH-6-x;9 )) . (7) 
x*o 

The values of f(x) may be computed via the following inductive 
formulae: 



and 

f (x+1) - f(x) • (n-x) (o+x) 

tw (x+1) (n+6-x-l) ' x 0,1 nl * 
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The following recurrence formula, on the other hand, will quicken 
the evaluation of the Incomplete beta functions: 

(10) 

Finally, as in Section 2, let B be the maximum funds allocated 
for possible remediation involving a group of m subjects. Then the 
passing score cannot exceed the highest integer c^ at which 
Y(c x ) < B/m. 

Numerical Example 1 

A maximum sum of B - $4000 has been allocated for remediation 
in an instructional program with m « 100 students. Thus B/m ■ $40, 
For the program under study, assume that 8 Q * .60 and the remedia- 
tion costs are y q * $150 for e*»ch student with true ability 0 < .60 
and e $50 for students with 6 _> .60. Now suppose a 5-itam test 
is administered and the test scores yield the estimates a - 3 and 
3=2. At the passing scores c - 1, 2, 3, 4, and 5, the expected 
remediation costs y(c) are $7.02, $19.06, $31.83, $41.25, and 
$47.19, respectively. Since yic^) _< $40, it follows that c 1 - 3. 
The budget constraint Imposes an upper limit of 3 on the passing 
score. If 3 is used, the expected cost of remediation amounts to 
$3183. If the next higher passing score, 4, were used, the expected 
remediation cost would be $4125, over the maximum budgeted sum of 
$4000. 

4. THE BETA-BINOMIAL MODEL WITH LINEAR COSTS 

Let us suppose now that the cost function may be written as 
5(6) = (Y^Y^U-e) + Y X , (11) 

in which y^ < Y Q - Thus the cost is a linear function of 6. It is 
equal to y q when 6 ■ 0 and when 8*1. 

Under the beta-binomial model as described in the first para- 
graph of Section 3, the expected cost per student is given as 
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" B(a,8) 0 [(Y 0 -Y 1 )B(orhc,n+8-JPfl) + Y 1 B(ctfx,nf g-x) J . 
By noting that 

B(o+x,n+8-x+l) - B(a^,n+8-x) 
it may be deduced that 

c-1 (Y -YtMiH-B-x) 

««> - x l o '<*> — * n 

c-1 a (n+p-x) + y. (ct+x) 

As in the previous section, the values of f (x) may be computed 
inductively via Equations (8) and (9). 

Numerical Example 2 

Consider the basic data of the first numerical example, namely 
B/m » $40, y 0 - $150, Yl - $50, a « 3, @ - 2, and n - 5 items. At 
the passing scores of 1, 2, 3, 4, and 5, the expected remediation 
costs Y (c) are $5.71, $18.81, $37.86, $59.29, and $78.33. Hence 
the passing score cannot exceed 3, where the maximum value of the 
expected cost of remediation would amount to $3786. Had a score 
of 4 been selected, the expected cost would have amounted to as 
much as $5929. 

To close this section, it should be mentioned that simple 
expressions for y(c) such as the one of Equation (12) may be worked 
out for all cost functions 6(6) which can be represented as inte- 
gral polynomials of 6. 

5. THE BIVARIATE NORMAL MODEL WITH CONSTANT COSTS 

Now consider the case where the true score 6 and the observed 
score x are jointly distributed according to a bivariate normal 
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distribution. Without any loss of generality, it may be assumed 
that x is in its standardized form with zero mean and unit variance. 
Let p be the reliability of the test for the normal population of 
subjects under consideration. The true score 8 has a mean of zero, 
a standard deviation of /p f and a correlation of /p with the test 
score x. The joint pdf of x and 8 is 

N 2 



f (x,8) ■ ■ exp 

2tt/ P (1- P ) 



(13) 



As in Section 3, it will be assumed that the cost function 

6(8) is constant, taking the values of y for 8 < 8 and the value 

o o 

of Yi for 8 £ 8 Q . It follows from Equation (1) that at any passing 
score c, the remediation cost for a subject drawn randomly from 
the population is expected to be 

e 

Y(c) - Y Q J° / ° f (x,8)d8dx + Yl J° Q f(x,9)d8dx 

e 

= Yl Pr(x < c) + (y^yj) f J ° f(x,e)dedx. (14) 

— OO .00 

The maximum passing score c^ satisfies the equation y(c^) ■ B/m. 

This value of c, exists as long as B < y where 
1 * 'max 

Y * Y P*(8 < 8 ) + y,¥r(B > 8 ) . 
'max 'o o '1 — o 

Solutions may be found via numerical procedures such as the 
Newton iterative solution for nonlinear equations. To apply this 
technique, it may be noted that the derivative of y(c) with respect 
to c is 

8 

Y'(c) - Y x f N (c) + (Y^) J ° f(c,8)d8 

where f„(.) denotes the pdf of x (the unit normal variable). In 
N 

other words, 

. , . 1 -c 2 /2 
f M (c) - e 

It may also be noted that 
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0 

/ ° f(c,0)d0 » f N (c) • F N 



0 -pc 
o 



/~2J 
/d-d 3 



where F N (.) is the (cumulative) distribution function of the unit 
normal variable. 
In summary, 



Y'(c) 



\ + ( V Y 1 )F N 



0 -pc 
o 



p-p 



(15) 



Both y(c) and y f (c) may be evaluated via computer programs such as 
those described in the IMSL (1977) . They may also be obtained by 
use of appropriate tables for the univariate and bivariate normal 
distributions. 

Numerical Example 3 

Let the parameters defining the problem be p = ,64, 0 q - 1, 
Y Q s $150, y 1 m $50, and B/m - $40. Numerical procedure yield3 the 
maximum standardized passing score c.^ » -.475. If the test scores 
have a mean of 50 and a standard deviation cf 20, then the passing 
score cannot exceed 40.5. 

6. THE BIVARIATE NORMAL MODEL 
WITH NORMAL-OGIVE COST 



Now consider the case where the cost function 5(0) is of the 



form 



5(6) = (y^) 



1 - F. 



N 



0-e 



+ Yi 



(16) 



where, as before, F N (.) represents the distribution function of the 
unit normal variable. In the context of decision theory, expres- 
sions similar to those of Equation (16) have been proposed as 
utility functions (e.g., Lindley, 1976, and Novick and Lindley, 
1978). As in the case of the beta-binomial model with linear costs, 
Y Q and represent the remediation costs associated with the least 
able (0 * -~) and the most able (0 » +») subjects. On the other 
hand, the parameter 0 q is the location at which the cost is 
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^ Y o +r L^ 2 and lndlcates the extent to which 6(9) decreases at 
this location. 

The expected remediation cost y(c) may now be written as 
c + 00 

T(c) - / / f(x,9)6(9)d9dx 



Y o Pr(x < c) - (Y^Yp J «).(x)f N (x)dx 



(17) 



where 



+00 



<Kx) = / f(9|x)F, 



N 



9-9 



d9 . 



The conditional pdf f (9|x) is given as 



f(9|x) = 



/2wp(l-p) 
It follows that 



exp 



(e-px) 



2p(l-p) 



♦ (X) = 



2wcr/p(l-p) -» 



/ {exp 



[ 2p(l-p)J 



9 

/ exp 



(t-9 ) 
o 



2o' 



dt}d9. 



It should be noted that the expression 



2wo/p(l- p ) 



exp 



Xe-px) 2 

[ 2p(l-p) 



2a 4 



acts as the joint pdf of two independent normal random variables 9 
and t with means px and 9 q , and with variances p(l-p) and a 2 . 

Now let us introduce the new random variable u - 9 - t for 
which the mean is px - 6 q and the variance is p - p + a . Since 
the condition t < 9 is equivalent to u > 0, it follows that <J>(x) 
may be expressed simply as 

00 00 

<Kx) - J 0 J 8 0u (9,u)d9du, 

—00 

where 8 0u (0»u) is the bivariate normal pdf of 9 and u. Hence 



<Kx) - Pr(u > 0) - 1 - Pr(u < 0) 



or 
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*(x) = 1 - F 



N 



e o -px 



f~2. 2 
/o-o ■ 



(18) 



e o - P x 



/d-d +a t 

With this new expression for 4>(x) » the expected remediation 
cost as defined in Equation (17) may be written as 

c 

Y(c) = Yl Pr(x < c) + (Y^) / F N 

l/p-P"+a"J 

The integral found in Equation (19) may be written as 
c h(x) 

z < c ) " / / f N (w)f N (x)dwdx, 



/d-d + 



f N (x)dx 



(19) 



2 2 

where h(x) * (-px+6 Q )M)-p +a , and f N (0 is again the pdf of a unit 
normal variable. Let 



/ o 2 

v = w - h(x) = w + (px-6 )//p-p 4a 

o 

Then x and v follow a joint bivariate normal pdf, g^tejv), with 
means, variances, and correlation given, respectively, as 
M ~ 0, 



= -8 /^-p 2 +a 2 , 



v 

o 2 -l, 

X 

2 

a = 



(p+o 2 )/(p-p 2 +a 2 ) , 



(20) 



and 
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p xv " p/ ^ a 



Hence the integral Z(c) takes a simpler form given as 
c o 

Z(c) - / / g (x.v)dvdx, 



— 00 —00 



and the expected remediation cost y(c) may be written as 



c o 



y(c) = Y x Pr(x < c) + (y Q -y{) j j g^x^dvdx 



(21) 



The numerical vlues of y(c) may be computed via tables or 
computer programs dealing with the univariate and bivariate normal 
distributions. 
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Numerical procedures such as the Newton iteration process may 
be used to solve the equation y(c) = B/m. The derivative of y(c) 
with respect to c, from Equation (19), is found to be 



Y ' (c) = f N (c) 



Y 2 + <VV F N 



e o - P c 



J 2, 2, 
Vp-p +a * 



(22) 



It may be noted that by taking a 2 « 0, Equations (19) and (22) 

of this section will reduce to Equations (14) and (15) of Section 5. 

This is expected since the normal-ogive cost function 6(6) as 

defined in (16) will degenerate into the constant cost function of 
2 

Section 5 when a tends to zero. Finally, the maximum expected 
remediation cost (per random subject) may be deduced from Equation 
(21) by letting c « +«. It is 

0 



y m = Y l + ( VV F N 



/p+a 



(23) 



Numerical Example 4 

Let the parameters of the problem be p = .64, 0 q ■ 1, a - 2, 
Y Q 58 $150, y 1 = $50, and B/m - $40. The Newton iteration procedure 
for solving the equation yic^) » B/m yields the solution ^ = - .362. 
If the test scores have a mean oc 50 and a standard deviation of 20, 
then the test passing score cannot exceed 42.76. 

7. SOME CONCLUDING REMARKS 

In this paper a general model along with four separate illus- 
trations is provided for the consideration of budgetary constraints 
in the setting of passing scores in instructional programs involv^ 
ing remediation for subjects with poor test performance. The 
illustrations are not meant to be exhaustive. Budgetary constraints 
normally impose a limit on the number of students allowed to take 
remedial learning activities and, hence, restrict the range in 
which a choice for the passing score is to be made. The paper also 
provides ways to assess the budgetary requirement associated with 
each passing score. This information would be a factor in deci- 
sions regarding passing scores and budgets for remediation. 
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ABSTRACT 

In mastery testing the raw agreement index and the kappa index 
may be secured via one test administration when the test scores 
follow beta-binomial distributions. This paper reports tables and 
a computer program which facilitate the computation of those indi- 
ces and of their standard errors of estimate. Illustrations are 
provided in the form of confidence intervals, hypothesis testing, 
and minimum sample sizes in reliability studies for mastery tests. 



1- INTRODUCTION 

As indicated by several writers including Carver (1970) and 
Hambleton and Novick (1973), one of the uses of criterion-referenced 
testing is to classify examinees in two or more achievement cate- 
gories. In this context, referred to here as mastery testing , 
reliability would be most appropriately viewed as classification 
(or decision) consistency across repeated test administrations 
using the same form or two equivalent forms. Decision consistency 

This paper has been distributed separately as RM 78-1, December, 1079. 
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may be quantified by the raw agreement index p which expresses the 
proportion of examinees classified in the same category by both 
testings* When the two test admirist rations yield equivalent (or 
v ^changeable) test data, p is bounded from below by p^, the propor- 
tion of consistent decisions which would be expected if no rela- 
tionship existed between the two sets of data (Huynh, 1976, 1978) • 
In other words, P c £ P £ 1- In a number of instances, for example 
when decision consistency is to be compared for two testing situa- 
tions involving different P c values, it would be suitable to scale 
p so that it forms an index with a range from 0 to 1. The kappa 
coefficient (Cohen, 1960), as defined by k - (p-p c )/(l-p c ) , is 
such an index* This coefficient represents the extent of improve- 
ment in decision consistency which is reflected by the dependency 
between two equivalent sets of data. 

The definitions of both p and kappa include the notion of 
repeated testings. However, there are at least two procedures by 
which p and kappa may be approximated via test data collected from 
one test administration (Huynh, 1976; Subkoviak, 1976). The 
Subkoviak procedure relies on the estimation of the true score for 
each individual examinee. When combined with the binomial or com- 
pound binomial error model, the estimated true score will yield a 
consistency * dex for each examinee. The average of th*3 index 
over a population of examinees is the Subkoviak estimate for p. 

The Huynh method, on the other hnnd, assumes that test scores 
on one form follow a beta-binomial model and test scores on both 
forms distribute jointly as a bivariate beta-binomial distribution. 
Both p and kappa (and other similar indices) may then be computed 
via the univariate and bivariate distributions. In a simulation 
study based on real test data, Subkoviak (1978) concluded that "all 
things considered, the Huynh approach seems worthy of recommenda- 
tion. It is mathematically sound, requires only one testing, and 
provides reasonably accurate estimates, which appear to be slightly 
conservative for short tests" (p. 115). 

This paper will consider only the Huynh procedure for the 
approximation of p and kappa. Section 2 will provide a review of 
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the computation of p and kappa. Section 3 will present formulae 
for computing the asymptotic standard errors of their estimates. 
Section A will describe the arrangement of the tables regarding p 
and kappa and their standard errors. Section 5 describes the 
interpolation process for nontabulated entries. Some applications 
of the tables will be presented in Section 6. The last two sec- 
tions deal with a computer program for the estimates and their 
standard errors* 

2. COMPUTATIONS FOR p AND K 

Consider now the administration of an n-item test to a popula- 
tion of examinees with true ability distributed according to the 
beta density with parameters a and 3. The frequency distribution 
of the observed test score x is given by the beta-binomial (or 
negative hyper geometric density 

f(x) - Q B(a + x, a + 3 - x)/B(a,3). (1) 

In this formula as well as in all other subsequent ones, the 
notation B denotes the beta function • The density f (x) may be com- 
puted via any of the following inductive formulae 



f(0) 



n 

n 



n+3+i 



i»l 

f(xfl) - f(x) 



n+a+3-i 



(2) 



(n-x) (a+x) 
(x+l)(n+3-x-l) 



x=0,l, . . . ,n-l; 



or 
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ff(n) - n 
i=l 



n+a-i 
n+a+3-i 



f(x-l) = f(x) 



x(n+g- x ) 



x' l,...,n. 



(3) 



(n-xfl)ta+x-l) 

The first recurrence scheme is more efficient for small test scores 
whereas the second set works better for large test scores . 

Let x and y be the test scores obtained by administering two 
equivalent n-item tests to each examinee in the population. Under 
local independence with respect to true ability, x and y follow the 
biv*-riate beta-binomial (or negative hypergeometric) density 
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Q ( n ) 

f(x.y) - B X (ct> g) B(a+x+y,2n+B-x-y). 

This density is symmetric in the sense thst f(x,y) - f(y,x). 

For values of x and y near 0, f(x,y) may be evaluated induc- 
tively via the following formulae: 

• l^&> 

and 

f(jcfl.v) » f (x v) • (n-x) (a+x*y) 

ruci-i,y; tu,y; (xf 1) (2n+e-x-y-l) ' 

For values of x and y near n, it is more efficient to use the fol- 
lowing formulae: 
2n 

f / s _ n n 2n+a-i n 2n-Kt-i 

f(n ' n) i ^ 2nH.+B-i ' f(n) 2n^+3-i ' 

and 

f (x-l,y) - f (x v) • x(2n+g-x-y) 

UX l ' 7) UX,y; (n-xfl)(<*'-x*y-l) ' 

Consider now the case where it is desired to place examinees 
into k classifications or categories defined by k-1 cutoff scores 
denoted by the integers Cy j-1,2, . . . ,k-l with 0 < c ± < . . . < c^ 
< n. The first category consists of all test scores between 0 and 
c^-1 inclusive. For the second category, the test scores ~ange 
between ^ and c 2 -l inclusive, and so on. Finally, for the kth 
category, the test score limits are c^ and n. For binary classi- 
fication, k=2 and the cutoff score c is traditionally referred to 
as a mastery or passing score. These two categories are represented 
as {x: 0 £ x < c-l) and (x: c £ x £ n). For k classifications as 
defined above, the raw agreement index is expressed as 
k 

p - I 

j-1 



* f(x,y) 

i x >y" c j-i 

Here c q - 0 and c fc - n+1. The lower limit for decision consistency 
is given as 
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j=l 



v 1 

z 



f(x) 



x=c 



As previously mentioned, the kappa index is defined as k - (p-p ) 

/u-p c >. 

The formulae become somewhat simpler for binary classifica- 
tions. For the use of c near 0, let 

c-1 
s Z f (x) 



and 



oo 



x*0 



c-1 

2 f(x,y) 
x,y=0 



Then 



p = l-2(n 0 -p 00 ) 



and 



K - (Poo-Po)''(P c -Po) • 
On the other hand, for values of c near n, let 



P x " - f(x) 



x=c 



and 



'11 



E f(x,:> . 



x»y s c 



Then 



and 



p * l-2( Pl -p n ) 



< a (P n -Pl)/(P 1 -Pl) . 



3. ASYMPTOTIC SAMPLING DISTRIBUTION 
OF THE ESTIMATES 

The estimation fur p and < may be carried out b> replacing a 
and & by their estimates in the appropriate formulae of Section 2. 
There are at least two ways to estimate a and £5, namely the maximum 
likelihood (ML) principle and the moment method. Let x and s be 
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the mean and standard deviation of the test scores of m examinees, 
and let the estimated KR21 reliability be defined as 



a 21 " n-1 



2 j 
ns * 



The moment estimates of a and 3 are given as 
and 



a « (-1 + l/a a )x 



3 = -a + n / a 2i ~ n - 
These estimates are positive (thus acceptable) only when 0<ot2^<l. 
When the test scores do not show sufficient variability, the com- 

A 

puted value for a^i "^y be z ero or negative. If this happens „ 
replace this computed value by the smallest positive estimate for 
test reliability which happens to be available. 

Maximum likelihood estimations for a and 3 have been consid- 
ered by Griffiths (1973). A fairly efficient algorithm has been 
provided by Huynh (1977). Starting with the moment estimates, the 
Newton-Raphson procedure as implemented by Huynh has been found to 
converge very quickly in practically all cases considered by the 
author. It has been found that the ML estimates, in most cases, do 
not differ appreciably from the moment estimates a and 3, hence 
general sampling properties appropriate for the ML estimates would 

* * A A 

be applicable to a and 3. For example, asymptotically, ^(a-a, 3-3) 
follows a bivariate normal distribution with zero mean and covari- 
ance matrix Z = (o^) * || b pq || -1 where 



n 



x-0 



8f(x) 



da 



2 



/f(x) 



b = b = z MisI . MM /f(x) 

12 °21 n 8a 80 /Z[ - X} 

x=0 



and 



n 

b 22- \ 
x»0 



. ™ J 



2 



/f(x) 



Now let p = p(a,B) and k - K (a,0) be the functions of (a,0) defin- 
ing the two reliability indices. By replacing a and 3 by and 0 

A A 

respectively, the moment estimates p and k may be obtained for p 
O 144 
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and k . It may be noted that both p and k are continuous with 
respect to (a,g). It follows from Rao (1973, p. 386-7), that as m 
goes to infinity, /m(p-p) and /m(<-K) converge to two normal dis- 
tributions with zero means and with variances 



and 



p ll^a' ^12 3a 3g T °22^3g ; 



V 2 - n A 2 4. 9« 3,c 3|C ^ /3<n2 



'llW '°12 3a" ' 3g to 2 2^ 
respectively. Thus, it may be said that p has an approximate nor- 
mal distribution with mean p and standard deviation (standard 
error) of o oo (p) -= V //S when m is sufficiently large. An estimated 
standard error for p, namely s oo (p), may be obtained by replacing a 
and 3 by their estimated values a and 3. The discussion also holds 

A A 

for k. Thus k has an approximate normal distribution with mean k 

A 

and standard error o^k) - V fJm. The estimated standard error 

A A 

s a>0O may be obtained in the same way as s^Cp). 

4. TABLES FOR p, V^, k, AND V 
FOR SHORT TESTS K 

Appendix A presents tables which facilitate the computations 
for the reliability indices p and k and their standard errors for 
the case of tests having 5 to 10 items. All computations were car- 
ried out via the IBM 370/168 system at the University of South 
Carolina, using the double precision mode. 

Input data to the tables are (1) number of test items, n, 
(2) mastery or passing score, c, (3) test mean, x, and (4) the KR21 
reliability estimate, a n . It may be noted that if a and I are any 
estimates of the parameters a and g other than the moment estimates, 
then the entries for test mean and KR21 re simpJy nct/(a+g) and 
n/(n-Kx+g) f respectively. 

A 

For each entry of (n.c.x.a^), f OU r values may be read out. 
They are p, V p , k, and V. respectively. Both V and V. are enclosed 
in parentheses. 
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The tables are constructed for n ■ 5 (1) 10 and ql^ » 
.10 (.10) .90. For each n, the mastery score c Is set equal to 
n Q ,n o +l, . . . ,n-l,n where n Q Is the smallest Integer such that n Q £ n/2 
and with x - n times a decimal which ranges from .10 to .90 in steps 
of .10. To read the values of p, V , k, and V for a mastery score 

P K 

of c < n Q , simply enter the tables with a mastery score of n-c+1 
and a test mean of n-x. 

Numerical Example 1 

Let n - 10, x » 6, a 21 = .50, and c - 7. Then p = .680, 

V p s - 278 » K = - 3 * 7 > and V K m - 582 - If the data are obtained from 
a random sample of m = 36 examinees, then the estimated standard 
errors are s (p) = .278/6 - .046 for p and s (k) - .582/6 - .097 

* 00 

for k. 

Numerical Example 2 



Let n = 8, x = 6.4, a 21 = .30, and c = 3. Here n^ = 4 



o 



The 



values of p, V p , k, and may be obtained by using the entry n = 8, 
x - 8-6.4 = 1.6, a 2l ■ .30, and c = 8-3+1 = 6. The results are 
p = .988, V = .075, k = .050, and V « .448. With m = 25, for 
example, the estimated standard errors are s (p) = .015 and 

* 00 * 

s (k) = .090. 

00 

5. INTERPOLATION 

As revealed thrrugh the tables, p, V , k, and V r are not 

P J: * 

monotonically increasing or decreasing functions of x at each cu-, 

21 

or of a 21 at each x. Hence interpolation should not be carried oat 
indiscriminately. However, in situations where a^ 9 x/n, and c/n 
are not too extreme, for example when all these quantities are 
between .20 and .80, the monotonicity property usually holds. If 
so, bivariate linear interpolation may be safely carried out to 
approximate the values of p, V , k, and V^. 

Suppose a 21 and x represent the computed values of KR21 and 
the test mean. In general, let f(a 01 ,x) be any one of the quanti- 
ties p, V , k, or V that are needed but not found in the tables. 
Let u 1 and u 2 (where < o 21 1 u 2 ) be the two tabulated values 
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A 

closest to the computed c^-value. Also, let v 1 and v 2 (where 
v l 1 x <, v 2 ) be the two tabulated values closest to the computed 
x-value. Define the following: 



and 



(a 2r u i ) 

(u 2 - Ul ) 



(x-v^ 

s = 



(v 2 -v l ) ' 



Then the linearly Interpolated value for f (a n ,x) Is given as 
f(u,v) = (l-r)(l-s)f(u 1 ,v 1 ) + r(l- 8 )f (u 2 ,v.) 

+ s(l-r)f( Ul ,v 2 ) + rsf(u 2 ,v 2 ) 
(see Abramowltz & Stegun, 1968, Formula 25.2.66). 
Numerical Example 

Let n = 10. a 21 = .56 (=u) , and x = 4.77 (=v) . Here \x ± = .50, 

u 2 = .60, r - .60, v x = 4.00, v 2 = 5.00, and s - .77. At the 

mastery or passing score c = 7, it may be found that the p-values 

are fOi^) = .839, f^.v^ = .836, fCu^v,,) = .742, and 

flu 2 ,v 2 ) = .761. Hence the linearly interpolated value for p at 

a 21 « .56 and x = 4.77 is given as .40 x .23 x .839 + .60 x .23 x 

.836 + .77 x .40 x .742 + .60 x .77 x .761 = .773. In the same 

way, other linearly interpolated valuer are V = .205, k - .365. 

P 

and V k = .574. The exact values for p, V , k, and V k computed 
directly from the formulae of Section 3 are .771, .201, .364, and 
.574, respectively. 

6. APPLICATIONS 

Besides easing the computations for p, k, and their tandard 
errors in the case of short tests, the tables may be used to 
establish confidence intervals for p and k, to test the equality 
of two or several independent p or k's, and to answer questions 
regarding sample size in reliability studies for mastery tests. 
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6.1, Inference for One Sample 

Let a 5-item test be administered to 100 students and let the 
summary test data be x * 3.500 and a 21 « .400. At the mastery score 
c - 4, the tables yield the values p - .650, V p - .386, k » .293, 
and V * .760. The estimated standard errors a^e s (p) - .386/10 « 

A 

.039 and s^k) - .763/10 = .076. The 90% confidence intervals are 

.650 + 1.645 x .039 or (.581, .714) for the parameter p, and 

.293 + 1.645 x .076 or (.168, .418) for the parameter k. 

Hypothesis testing may also be conducted for the one-sample 

case. To test the null hypothesis that p is equal to a specified 

value p h against an appropriate alternative, simply compare the 

Student-like ratio t - (p-p„)/s (p) with suitably chosen critical 

p H 00 

value(s) read from the unit normal distribution. For k, use the 

A A 

ratio ■ (ic-k^/s^ic). With the data provided in this section, 
the null hypothesis p fl - .50 corresponds to the Student-like ratio 
t - (.650-. 500)/. 039 =3.846. The null hypothesis - .350 is 
associated with the ratio t - (.293-. 350)/. 076 = -.75. If the 
alternatives are two-sided and if the level of significance is 10% 
(at which the critical values are + 1.645), the null hypothesis for 
p^ is rejected, whereas the one for is accepted. 

6.2. Inference for Two Independent Samples 

Any inference for the case of two independent samples may be 

A A A 

carried out by noting that the standard error of p,-p~ f where p 1 

a X L X 

and p 2 are two independent sample p-values, is 



2/ A x . 2 . 
8 «>1> + 8 „<P 2 > 



For two independent and k 2 » the standard error of <^.~ K 2 
given as 



sJk 1 -< 2 ) 



a J K l> + 8 „( K 2 ) 
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For example, let the data for the first sample be n ■ 5, c = 4, 

A ,y 

x =■ 4.000, a - .600, and m » 100. It follows that p.. * .785, 

A fcX A a X 

s ~(Pi) 88 -0289, K- - .464, and s (O * .0675. For the second 

00 1 X 90 X 

sample, chosen independently from the first one, let n - 8, c ■ 6, 
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X S A * #8 a 21 * #300, and m " 6 * # lt ^ be verified that p 2 ■ .633, 
s o 0 (P2 ) = - 0398 » < 2 = #196, and 8 «d< k 2* " #093# It: follows that 



and 



s oo (P 1 "P 2 ) - .049 



These standard errors will allow the formulation of confidence 
intervals for the parameters P 1 "P 2 and k^-k^. For example, at the 
90% confidence level, the confidence intervals are (.785-. 633) + 
1.645 x .049 or (.071, .233) for P-^, and (.464-. 196) + 1.645 x 
.115 or (.079,. 457) for iCj-iCj- Student-like ratios may also be 
computed to test the equality hypothesis for p^ and p 2 , and for 

k and k 9 . For p 1 - p 0 , the mentioned ratio is t ■ 

1 1 12 p i" p 2 

(.785-. 633)/. 048 ■ 3-T67 and for * k 9 , the corresponding ratio 
is (.464-. 196)/. 115 « 2.330. With two-sided alternatives ar*d with 
a level of significance of 10% (at which the critical values are 
+ 1.645), both equality hypotheses are rejected. 

6.3. Testing Equality of Several Independent p or tc's 

The mechanism by which equality of several p (or k) values is 
to be tested is similar to the one by which several independent 

A 

correlations ar* compared (Rao, 1973, page 434). Let p. and 
* i 

s oo (p^), i ■ 1,2,..., I, be the estimated raw agreement index and 

its standard error associated with the i-th sample. Let u. ■ 

2 a i 

l/s oo (p 1 ) be the reciprocal of the e ror variance, and let 



I 


A 


z 


u 1?1 , 


i=l 




I 


"2 


Z 
i«l 





and 



T 2 



I 

B =■ Z u,. 



1=1 

Then the statistic for testing homogeneity of the p-values is 

1*9 14C 
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H » T 2 - (tJ/B) , 

2 

which can be used as x with 1-1 degrees of freedom. Table 1 
presents the data and various computations for the statistic H. 
With the value H = 1.738 and 1-1 - 3 degrees of freedom (at which 
the 5% critical value is 7.815), it may be concluded that the four 

A 

independent p values do not differ significantly from each other at 
the 5% level of significance. 

TABLE 1 

An Illustration of Homogeneity Testing for p 



n c m 


x a 01 V 
21 p 


00 


u l 


A 

P l 


A 

U 1 P 1 


A 2 
Vl 


5 4 64 


3.0 .60 .269 


.033625 


884.454 


.730 


645.652 


471.326 


8 7 25 


4.8 .40 .239 


.047800 


437.667 


.776 


339.630 


263.553 


10 6 100 


5.0 .70 .206 


.020600 


2356.490 


.765 


1802.715 


1379.077 


9 6 49 


6.3 .50 .267 


.038143 


687.337 


.721 


495.570 


357.306 






Total 


4365.948 




3283.567 


2471.262 


Summary data: B = 4365.948 














T x = 3283.567 














T 2 = 2471.262 












Test statistic: H = 1.738 


;ith df = 4-1 - 


3 







6.4. Sample Size Determination 

In some reliability studies for mastery tests, it may be neces- 
sary to determine in advance the minimum number of examinees needed 
to achieve a given degree of accuracy. For example, if a standard 

A 

error s m (p) of no more than 100y% of the parameter p is acceptable, 
then how many examinees should be tested? The question, of course, 
may not have an answer unless there are some indications about the 
mean and variability of the test scores. In a number of situations 
involving an n-item test with a options for each item, it may not 
be unreasonable to assume that the test mean is about halfway 
between the chance score n/a and the maximum score n and that the 
standard deviation s is about one-fourth of the difference between 
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these two scores. In other words, the "guessed-at" values for x, s, 
and aL^i are given as 

x - (n + n/a)/2 , 
s = (n - n/a)/4, 



and 



n 

"21 = -^I 



I _ x(n-x) 



2 
ns 



/\ 

By entering these values of x and a 21> along with n and c, those of 

p and V - y/m s^p) may be deduced. Then m may be approximated by 

noting that the ratio of V^/i^T to p cannot exceed y. In other 

words, the minimum number of examinees is (v /(yp))^- 

As in illustration, let n = 8, a « 5, c = 5, and y = 0.05, 

Then x « 4.8, s = 1.6, and a 21 = ,29. From the tables, it may be 

found that approximately p = .615 and V = .369. The minimum number 

P 

of examinees is 144. If y is ,10, then only 36 examinees would be 
needed. 

7. COMPUTER PROGRAM 



Appendix B lip^s a FORTRAN IV program which computes the 

A A A A 

values of p, s^p) , k, and s^ic) for situations *ith k classifica- 
tions. The input data are to be keypunched on three cards detailed 
as follows. 



First Card 



This contains the title of the problem, keypunched anywhere 
between columns 1 and 80. 



Second Card 



Thi* provides data on number of items (n) , number of exami- 
nees (m), number of classifications (k) , the test mean (x) , and 
the test standard deviations (s) . These must be keypunched accord- 
ing to the format (315, 2F10.5). 



Third Card 



This contains the (k-1) cutoff scores, keypunched with the 
format (1615). Thus reliability problems with 17 classifications 

Us 
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TABLE II 

An Output of the Computer Program 



ESTIMATES OF DECISION RELIABILITY 
AND THEIR STANDARD ERRORS IN 
MASTERY TESTING BASED ON THE BETA- 
BINOMIAL MODEL 
TITLE OF THIS JOB IS: 
AN EXAMPLE OF RELIABILITY COMPUTATION 

INPUT DATA ARE: 

NUMBER OF ITEMS .. - 8 
NUMBER OF SUBJECTS - 25 

MEAN OF TEST SCORE - 4.80000 

STANDARD DEVIATION OF TEST SCORE - 2.22596 
NUMBER OF CATEGORIES - 2 
CUTOFF SCORE - 5 

OUTPUT DATA ARE: 

ALPHA = 2.05710 
BETA = 1.37140 
KR21 = 0.70000 

RAW AGREEMENT INDEX P - 0.77095 
STANDARD ERROR OF P.. - 0.04345 

KAPPA INDEX - 0.53165 

STANDARD ERROR OF KAPPA - 0.08871 



** NORMAL END FOR THIS JOB ** 
PROGRAM WRITTEN BY HUYNH HUYNH 
COLLEGE OF EDUCATION 
UNIVERSITY OF SOUTH CAROLINA 
COLUMBIA, SOUTH CAROLINA 29208 
REVISED, DECEMBER 1979 
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may be implemented via this FORTRAN program. 

A 

The computer program starts with the computation of a, 



21' 



If 



a 21 is zero or negative, the following message will be printed: 
NON-POSITIVE ESTIMATE KR21. 

MOMENT ESTIMATES FOR ALPHA AND BETA DO NOT EXIST. 
IMPUTATIONS DISCONTINUED FOR THIS CASE. 



Otherwise, the estimates a and 8 will be obtained. These, in turn, 

AAA 

will be used as input in a subroutine which computes p, s^p), k, 
and s (k). 

00 

For example, let the input cards be as follows: 

1 1 2 2 3 3 
Column : 1...5....0 5. . . .0. . . .5. . . .0. . . ,5 

First Card : AN EXAMPLE OF RELIABILITY COMPUTATION 
Second Card : 8 25 2 4.8 2,22596 

Third Card : 5 

In other words, n = 8, m = 25, k = 2, x = 4.8, s = 2.22596, c = 5. 

A 

The output is printed in Table 2. It may be read that p = ,77095, 
sjp) - .04345, k = .53165, and sJk) = .08871. 

Several problems may be performed in one run by stacking the 
input cards together. 

8. DISCLAIMER 

The computer program presented in this report has been written 
with care and tested extensively under a variety of conditions 
using tests with 60 or fewer items. The author, however, makes no 
warranty as to its accuracy and functioning, nor shall the fact of 
its distribution imply such warranty. 
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APPENDIX A 

Tables of the Raw Agreement Index and Its Standard Error 
Times the Square Root of m, the Kappa Index and Its 
Standard Error Times the Square Root of m, 
When the Beta-Binomial Model is Assumed 

(m * Number of Subjects) 

Input data to the tables are (i) number of test items (n), 
(ii) mastery score (c), (iii) test mean (x), and (iv) the KR21 
reliability ( a 21 ) . (Note that if a and g are any estimates of the 
parameters a and 8 other than the moment estimates, then the entries 
for test mean and KR21 are simply na/(of$) and n/(n+ct+8), 
respectively. ) 

For each entry of (n* c, x, a 21 ) , four values may be read out. 
They are p f V . k, and V , respectively. Both V and V are en- 

P K p K 

closed in parentheses. 
E xample 

Let n - 5, c = 3, x =* 1.5, and <* 21 = .400. The tables provide 
the values p - .755, V = .267, k = .268, and V - .784. With 
m = 100, for example, the estimated standard errors are sCp; = .0267 
a:.d s(k) = .0784. 
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Table of the Raw Agreement _ndex and its 

S.fi.*SQRT(M), the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M ■ Number of subjects 
Number of items N ■ 5 
Mastery score C - 3 



Test KR21- 

^ '200__ -300 .400 .500 .600 .700 .800 .900 

0 ' 5 rS^wJ'JSSw 0,957 0,949 °' 942 0.939""or940 _ "o"948~~Or96r 
0.022 0.062 0.122 0.198 0.288 0.392 0.510 0 643 0 798 
(0.477) (0.734) (0.928) (1.048) (1.091) (1.063) (0.969) (0.808) (0*570) 
1.0 0.879 0.869 0.862 0.858 0.858 0.864 0.877 0 901 0 938 

t*'-H\, i'ill* * 162 0,239 0,325 0- 421 0- 529 0-652 0.800 
(0./06) (0.808) v0 .853) (0.863) (0.831) (0. 769) (0.680) (0.563) (0.405) 

K5 rH'^wJ'^w 0 - 743 0,755 °' 772 °' 795 0.824 0.364 0. 913 
0 al7 )( a'V?¥ ( ?- 2 ? 9 >(0- 267 ) (0.245) (0. 223) (0.201) (0. 175) (0 137) 
(n * 7 L,«'ll 2 ^,°' 192 0,268 0,351 °' 441 °-5«2 0.659 0.801 
(0-874) (0.865) (0.833) (0.734) (0.720) (0.646) (0.561) (0.463) (0*339) 

2.0 0.591 0. 617 0.645 0.675 0. 709 0. 746 0. 789 0 .140 n Qfifi 

^•i?^ ( 2-?;? > <0 - 365> <o - 332> <o - 2 "> (S:?SS> <S:?j;> 

0.067 0.137 0.209 0.285 0.365 0.453 0.550 0 662 
(0.973) (0.898) (0.821) (0. 744) (0.666) (2.587) (J. j") (O.U/) (S.1o9) 

2.5 0.5:* 0.571 0.607 0.645 0.635 0.728 0.776 O 832 O 901 

u.u/u 0. 142 0.215 0.290 0. 370 0 457 n n aaa n on/ 

(1.006) (0.909) (0.818) (0.732) (0.115) (fcjg, (VMl) (0.' 4 03) (S.ISo) 

3.0 0.591 0.617 0.645 0.675 0.709 0.746 0 789 n flan n ons 

( •067 )( ^?3, )( n 0 iS^ < ^" 2)(0 • 299 ) ( ° : "'><°•"2^(o: 5°)(0°: ) 
0.067 0. 137 0.209 0.285 0. 365 0.453 0 550 0 662 n Rfl? 

(0."3)(0.898)(0.821)(0.744)(C.666)(S:i; 7 )(S:foS)(S:"m 

3 * 5 fn'lll^rl'llt, 0, 743 0,755 °' 772 °' 795 0.824 0.864 0.918 
( S , n 3 ? )( 2 ,3 o 1 , 3)( S- 289) '°- 267) (0- 245 >(0- 223 ) (0. 201) (0. 175)(0 137) 
0.057 0.122 0.192 0.268 0.351 0.441 0 542 O 

(0.874) (0.865) (0.833) (0. 784) (0. 720) C0.6W cS.I") (SlJSS) (SlSSi) 

4,0 ,°/™ 9 w 0,869 0- 862 °' 858 °- 8 58 0.864 0.877 0 901 0 938 
(0.297) (0.276) (0.252) (0.226) (0.202) (0.180) (0 162) (0 146) (2*119) 

0.042 0.096 0.162 0.239 0.325 0.421 0.529 0 652 0800 
(0.706)(0.808)(0.858)(0.863)(0.831)(0:7M)(S:i5J)(S:"^ 

4.5 0.975 0.966 0.957 0.949 0.942 0.939 0 940 n qar n 

( 0 0, j 57 >(^ 72 ^^ 

/°.\°55w 0, °° 2 0,122 0- 198 0- 288 0-392 0.510 0.643 0 798 
ittlllitll^ (0 * 928) (1 * 048) (1 -° 91) (1,063 > (0- 969 > (0.308) (2: 57 0) 
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Table of the Raw Agreement Index and its 

S.E.*SQI*T(M), the Kappa Index and its 
S.E.*SQRT(h) in the Beta-binomial Model 
M - Number of subjects 
Number of items N • 5 
Mastery score C ■ 4 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 

0.5 0.998 0.996 0.992 0.987 0.981 0.974 o"96 8 _ "o" 96a""o~97 l" 
(0.028) (0.045) (0.064) (0,084) (0.101) (0. lo8) (0. 102) (0.083) (0.068) 

0.005 0.021 0.055 0.111 0.192 0.297 0.427 0.583 0.768 
(0.142) (0.355) (0.611) (0.855) (1.041) (1.136) (1.118) (0.971) (0.682) 

1.0 0.980 0.973 0.963 0.953 0.942 0.932 '.925 0.926 0.9*5 

(0. 120) (0. 140) (0. 157) (0. 167) (0. 167) (0. 156) x 0. 133) (0.108) (0.094) 
0.014 0.042 0.088 0.152 0.235 0.338 0.459 0.603 0.775 
(0.300) (0.491) (0.661) (0.787) (0.854) (0.857) (0 . 796) (0. 670) (0 . 473) 

15 ,° n ' 9 JL . 0,916 0 * 903 °* 891 °' 882 0.376 0.876 0.889 0.923 
(0.242)^0.243) (0.237) (0.223) (0.202) (0. 175) (0. 148) (0. 127) (0.114) 

,°» 067 °- 123 0. 192 0.276 0.374 0.487 0.620 0.782 
(0.433) (0.620) (0.715) (0.764) (0.767) (0.727) (0.650) (0.537) (0.384) 

2.0 0.830 0.820 0.813 0.808 0.809 0.815 0.830 0.858 0-907 
(0.316) (0 292) (0.266) (0.238) (0.211) (0.186) (0.166) (0.150) (Oi 131) 

0.041 0.C93 0.155 0.228 0.311 0.404 0.511 0.635 0.787 
(0.666) (0.729) (0.755) (0.747) (0.710) (0.648) (0.565) (0.464) (0.337) 

2 * 5 ,° n 'tlL ,2* 701 0,709 °* 721 °' 738 °- 761 0.793 0.836 0.899 
(0.323) (0.299) (0 . 2 7 7) (0 . 25 6) (0 . 237) (0. 2 18) (0 . 19 9) (0 . 1 78) (0. 146) 
0.055 0.116 0.184 0.258 0.339 0.429 0.530 0.647 0.792 
(0.827) (0.817) (0.785) (0. 737) (0. 674) (0. 600) (0. 517) (0. 424) (0.313) 

3.0 0.576 0.601 0.628 0.658 0.692 0.730 0.775 0.829 0.898 
(0.401) (0.377) (0.352) (0.325) (0.298) (0. 269) (0. 238) (0. 203) (0. 156) 
0.065 0.134 0.205 0.280 0.361 0.448 0.545 0.657 0.796 
(0.952) (0.884) (0.312) (0.737) (0.660) (0.531) (0.499) (0.412) (0.308) 

3,5 ,~' 5 A\ 0,574 0,612 °- 650 °' 691 °- 73 5 0.784 0.839 0.908 
( 2 , «5J )( 2 ,473)(0,429)(0 - 386) (°- 345 )( 0 - 304 ) (0.262) (0.216) (0.159) 
^'nolwMiJw 0,217 0,293 0<37A °' 460 °-555 0.664 0.800 
(1.027) (0.932) (0.844) (0.760) (0.678) (0.598) (0.516) (0.429) (0.323) 

4,0 ,°/ 6 ? 6 x ,2* 662 0, 689 °' 718 °' 750 °- 785 r >-825 0.871 0.927 
(0.464) (0.428) (0.392) (0.358) (0.324) (0.289) (0 . 252) (0 . 208) (0.150) 

/?'!!I!!w!! ,U2 0,217 °* 294 °' 376 °-* 6 * 0.560 0.669 0.803 
(1.035) (0.969) (0.900^ (0 . 829) (0 . 754) (0 . 675) (0 . 590) (0. 492) (0.370) 

4.5 0.845 0.844 0.847 0.853 0.864 0.879 0.899 0.925 0.958 
(0,31 Z ) (?- 2 91) (0.267) (0.247) (0.231) (0.214) (0.195) (0.167) (0.121) 

fJ'SJJw?"^., 0,198 * 0,279 0,365 0,458 °' 559 °' 671 0-805 
^ 9 ff>fl-0ff) (1.052) (1.036) (0.988) (0.913) (0.810) (0.677) (0.502) 

For the mastery score - 2 enter N-xbar in the test"meIn"column 
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Table of the Raw Agreement Index and its 

S.E.*SQRT(M), the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M ■ Number of subjects 
Number of items N ■ 5 
Mastery score C - 5 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



0.5 1.000 1.000 0.999 0.998 0.996 0.993 0.938 9.980 0.975 
(0.002) (0.005) (0.010) (0.019) (0.032) (0.051) (0.072) (0.0S1) (0.062) 

0.000 0.004 0.015 0.040 0.088 0.168 0.288 0.458 0.687 
(0.019) (0.089) (0.231) (0.443) (0.699) (0.949) (1.125) (1 . 139) (0. 893) 

1.0 0.999 0.997 0.995 0.992 0.986 0.978 0.966 0.954 0.950 
(0.015) (0.024) (0.037) (0.055) (0.077) (0.100) (0.116) (0.111) (0.080) 

0.002 0.010 0.028 0.062 0.119 0.205 0.326 0.488 0.702 
(0.059) (0.158) (0.303) (0.476) (0.649) (0.787) (0.853) (0.807) (0.613) 

1.5 0.992 0.988 0.983 0.975 0.964 0.951 0.935 0.922 0.925 
(0.053) (0.070) (0.091) (0.112) (0.133) (0.148) (0.149) (0.125) (0.092) 

0.006 0.019 0.046 0.089 0.154 0.244 0.363 0.517 0.716 
(0.130) (C. 252) (0.393) (0.534) (0. 651) (0.723) (0.729) (0. 655) (0. 488) 

2.0 0.973 0.965 0.954 0.942 0.927 0.911 0.895 0.887 0.904 
(0.127) (0.147) (0.165) (0.180) (0.188) (0.184) (0.164) (0. 127) (0. 105) 

0.012 0.034 0.070 0.122 0.192 0.284 0.400 0.545 0.729 
(0.236) (0.364) (0.487) (0.591) (0.660) (0.682) (0.651) (0.562) (0.416) 

2.5 0.928 0.915 0.901 0.886 0.870 0.857 0.849 0.853 0.888 
(0.228) (0.236) (0.239) (0.235) (0.221) (0.196) (0.161) (0.128) (0.125) 

0.021 0.053 0.098 0.158 0.233 0.325 0.437 0.572 0.741 
(0.376) (0.488) (0.579) (0.641) (0.667) (0.652) (0.595) (0. 500) (0.371) 

3.0 0.843 0.830 0.817 0.806 0.799 0.796 0.803 0.826 0.880 
(0.311) (0.296) (0.275) (0.248) (0.218) (0.185) (0.158) (0.148) (0.151) 

0.033 0.076 0.131 0.197 0.275 0.366 0.472 0.597 0.753 
(0.544) (0.020) (0.668) (0.686) (0.673) (0.629) (0.557. '0.461) (0.347) 

3.5 0.714 0.711 0.711 0.715 0.725 0.742 0.770 0.813 0.883 
(0.314) (0.285) (0.257) (0.234) (0.216) (0.205) (0.201) (0,197) (0.173) 

0.047 0.102 0.166 0.237 0.316 0 405 0.505 0.621 0.764 
(0.734) (0.758) (0. 757) (0.732) (0.686) (0.621) (O.i'39) (0 . 445) (0 . 342 ) 

4.0 0.576 0.597 0.621 0.649 0.683 0.722 0.759 0.827 0.901 
(0.349) (0.346) (0 . 343) (0 .337) (0.328) (0.313) (0.:91) (0. 256) (0. 196) 

0.063 0.130 0.201 0.277 0.357 0.443 0.537 0.643 0.775 
(0.945) (0.910) (0.861) (0.799) (0. 727) (0. 646) (0.558) (0. 464) (0 . 366) 

4.5 0.560 0.603 0.647 0.691 0.737 0.783 0.832 0.883 0.938 
(0.672) (0.632) (0 . 587 ) (0 . 537) (0 . 482) (0.422) (0.354) (0 . 2 77) (0 . 183) 

0.080 0.158 0.237 0.316 0.396 0.479 0.567 0.664 0.785 
(1.202) (1. 127) (1.046) (0.960) (0. 870) (0. 776) (0.677) (0, 574) (0 . 464) 

For the mastery score - 1 enter N-xbar in the test mean column 
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Table of the Raw Agreement Index and its 
S.E.*SQRT(M), the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M - Number of subjects 
Number of items N - 6 
Mastery score C - 3 



0 

ERIC 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 

0.6 0.959 0.948 0.938 0.930 0.925 0.924""o"928""o"939""ol96l" 
C 2-222 > <2 ,207) (0.201) (0.188) (0.169) (0.14/) (0. 128) (0.114) (0. 093) 
,2-2?5w°- 07 \ 0,137 °' 214 °- 3 04 °-404 0.517 0.643 0.792 
(0.553)(0. 771) (0.918) (0.995) (1.008) (0.964) (0. 869) (0.724) (0. 517) 

1,2 /Hlnwn-iiilw 0 - 81 ^ 0,814 0,822 0- 836 °- 857 0.887 0.931 
(0.320) (0.293) (0.267) (0.242) (0.220) (0.199) (0.179) (0.157) (0.123) 

/R'Klw 0,111 0,180 0,256 °- 340 0- 431 0.533 0.650 0.793 
(0.793) (0.837) (0.842) (0.816) (0.766) (0.697) (0.611) (0.506) (0.368) 

1.8 0.637 0.657 0.679 0.704 0.732 0.764 0.803 0.849 0.910 
(0.395) (0.366) (0.337) (0.309) (0.279) (0.250) (0.218) (0.183) (0.137) 
,2-2S5w 0 - 133 * 0,204 0,279 0- 359 0.446 0.542 0.654 0.793 
(0.930) (0.873) (0.810) (0.741) (0. 668) (0. 592) (0.510) (0.421) (0.311) 

2,4 ,°/?22w 0,573 0,609 0- 646 0- 685 0- 727 0.774 0.829 0.898 
(0.487) (0.440) (0.396) (0.354) (0.314) (0.274> (0.235) (0. 193) (0. 143) 
0.069 0.140 0.212 0.286 0.365 0.450 0.544 0.654 0.792 
(0.973) (0.880) (0. 793) (0. 710) (0.629) (0.550) (0.470) (0.387) (0.287) 

3.0 0.574 0.601 0.629 0.660 0.694 0.732 0.775 0.828 0.896 
(O.J16) (0.384) (O- 35 -) (0.321) (0.289) (0.257) (0.222) (0.185) (0.140) 

/S-SSSwS- 1 ?!. 0,205 0,279 0,353 0- 444 0- 539 0-650 0.791 
(0.933) (0.858) (0. 783) (0.706) (0.629) (0.550) (0.470) (0.385) (0.285) 

3,6 /HSLMIL 0,721 * 0,734 0,750 0,773 O- 803 0- 844 0.903 
<2 ,328)( 2 ,304)(0,281)(0,258 > <U.236) (0.214) (0.191) (0.166) (0.132) 
0.055 0.117 0.185 0.259 0.340 0.428 0.528 0.643 0.788 
(0.820) (0.307) (0. 774) (0.724) (0.660) (0.586) (0.503) (< .411) (0.300) 

4,2 /S'SHwS'SJfw 0,838 0,833 0,832 0- 837 0-- 49 0.874 0.918 
( H?5 )( 2 ,284) (0- 26 °) (0.234) (0.208) (0.182) (0.160) (0.141) (0.118) 

,n*? 4 2wn-22K,H?iw 0,22 \ 0,311 0,404 °- 5 * 0 °-6 33 0- 78 5 
(0.645) (0.724) (0. 760) (0.757) (0.721) (0.659) (0.5?>) (0.470) (0.337) 

4,8 ,2'?5Zw2' 946 * 0 934 °' 923 0- 913 °- 9 06 0.905 0.913 0.940 
( MS? >(0,203) (O- 206 ) (0-200) (0.185) (0.163) (0.137) (0.115) (0.099) 

/2'??2w2-?Slw 0 - 115 , 0,185 0,271 0,371 °' 486 0.619 0.780 
(0.429) (0.603) (0. 731) (0.804) (0.822) (0.788) (0.708) (0.585) (0.413) 

5,4 ,2-SSL, 0 - 99 K 0,986 0,979 0,971 °' 964 O- 558 °- 9 57 0.968 
C 2*25I )( 2- 0 Z 4) (°- 095 > (0.113) (0.123)C0.121) (0.107) (0.086) (0.072) 
0.008 0.030 0.073 0.3.37 0.223 0.329 0.455 0 602 0 775 
_ ( °:? 10 ^ 0,448) (0-694) (0.896) (1.024) (1.062) (1.006) (o!853) (0.'595) 

For the Mastery score - 4 enter N-xbar in the test mean"colurol 
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RELIABILITY IN MASTERY TESTING 

Table of the Raw Agreement Index and its 

S.E.*SQRT(M), the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M ■ Number of subjects 
Number of items N ■ 6 
Mastery score C ■ 4 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



0.6 0.9*5 0.991 0.986 0.979 0.971 0.964 0.958 0.957 0.968 
(0.052) (0.074) (0.095) (0.113) (0. 123) (0. 121) (0.107) (0.086) (0.072) 

0.008 0.030 0.073 0.'37 0.223 0.329 0.455 0.602 0.775 
(0.210) (0.448) (0.694) (0.896) (1 . 024) ( 1 . 062) (1.006) (0.353) (0.595) 

1.2 0.957 0.946 0.934 0.923 0.913 0.906 0.905 0.T ' 0.940 
(0.192) (0.203) (0.206) (0.200) (0.185) (0.163) (0.137) (0. 1 ,) (0.099) 

0.022 0.061 0.115 0.185 0.271 0.371 0.486 0.619 0.780 
(0.429) (0.603) (0.731) (0.804) (0.822) (0.788) (0.708) (0.585) (0.413) 

1.8 0.857 0.846 0.838 0.833 0.832 0.837 0.849 0.874 0.918 
(0.305) (0.284) (0.260) (0.234) (0.208) (0.182) (0.160) (0. 141) (0. 118) 

0.040 0.091 0.154 0.227 0.311 0.404 0.510 0.633 0.785 
(0.645) (0.724) (0. 760) (0. 757) (0. 721) (0.659) (0. 575) (0. 470) (0.337) 

2.4 0.708 0.713 0.721 0.734 0.750 0.773 0.803 0.844 0.903 
(0.328) (0.304) (0.281) (0.258) (0.236) (0.214) (0.191) (0.166) (0. 132) 
0.055 0.117 0.185 0.259 0.340 0.428 0.528 0.643 0.788 
(0.820) (0.807) (0.774) (0.724) (0.660) (0.586) (0.503) (0.A11) (0.300) 

3.0 0.574 0.601 0.629 0.660 0.694 0.732 0.775 0.828 0.896 
(0.416) (0.384) (0.353) (0.321) (0.289) (0.257) (0.222) (0.185) (0. 140) 

0.066 0.134 0.205 0.279 0.358 0.444 0.539 0.650 0.791 
(0.933) (0.858) (0.783) (0.706) (0. 629) (0.550) (0. 470) (0. 385) (0.285) 

3.6 0.538 0.573 0.609 0.646 0.685 0.727 0.774 0.829 0.898 
(0.487) (0.440) (0.396) (0.3 ^4) (0. 314) (0.274) (0. 235) (0.193) (0. 143) 

0.069 0.140 0.212 0.286 0.365 0.450 0.544 0.654 0.797 
(0. f /3) (0.880) (0.793) (0.710) (0 . 629) (0. 550) (0 . 470) (0 . 387) (0.287) 

4.2 0.637 0.657 0.679 0.704 0.732 0.764 0.803 0.849 0.910 
(0.395) (0.366) (0.337) (0.309) (0.279) (0.250) (0.218) (0. 183) (0. 137) 

^.065 0.133 0.204 0.279 0.359 0.446 0.542 0.654 0.793 
(0.930) (0.873) (0.810) (0.741) (0.668) (0.592) (0.510) (0.421) (0.311) 

4.8 0.815 0.811 0.811 0.814 0.822 0.836 0.857 0.887 0.931 
(0.320) (0.293) (0.267) (0.242) (0. 220) (0. 199) (0. 179) (0. 157) (0. 123) 
0.051 0.111 0.180 0.256 0.340 0.431 0.533 0.650 0.793 
(0.793) (0.837) (0. 842) (0 . 8 16) (0. 766) (0.697) (0. 611) (0.506) (0.3G8) 

5.4 0.959 0.948 0.938 0.930 0.925 0.924 0.928 0.939 0.961 
(0.202) (0.207) (0.201) (0.188) (0. 169) (0.147) (0.128) (0.114) (0.093) 
0.028 0.074 0.137 0.214 0.304 0.404 0.517 0.643 0.792 
(0.553) (0.7 71) (0.918) (0.995) (1.008) (0.964) (0. d69) (0.724) (0.517) 

For the mastery score - 3 enter N-xbar in the test mean column 



HUYNH 

Table of the Rev Agreement Index and its 

S.E.*SQRT(M), the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M - Number of subjects 
Number of items N ■ 6 
Mastery score C ■ 5 



Test KR21- 

Meau .100 .200 .300 .400 .500 .600 .700 .800 .900 



0.6 1.000 0.999 0.998 0.996 0.992 0.986 0.979 0.972 0.973 
(0.006) (0.013) (0.024) (0.039) (0.058) (0.0/ 7) (0.088) (0.081) (0.059) 

0.001 0.009 0.029 0.069 0.137 0.235 0.366 0.532 0.737 
(0.048) (0. 175) (0.381) (0.631) (0.871) (1.045) (1.101) (1.001) (0.714) 

1.2 0.994 0.991 0.985 0.978 0.969 0.958 0.946 0.939 0.946 
(0.047) (0.065) (0.086) (0 . 107) (0 . 125) (0. 135) (0 . 129) (0.105) (0 080) 

0.006 0.022 0.054 0.106 0.181 0.280 0.406 0.559 0.748 
(0.143) (0.302) (0.482) (0 . 650) (0 . 773) (0.829) (0.804) (0.693) (0.488) 

1.8 0.971 0.962 0.951 0.938 0.925 ".912 0.902 0.902 0.923 
(0.142) (0.16'.) (0. 176) (0.185) (0.184) (0. 172) (0.147) (0. 116) (0.097) 

0.015 0.042 0.086 0.147 0.226 0.324 0.442 0.583 0.757 
(0.291) (0.446) (0.582) (0.681) (0.730) (0.724) (0.663) (0.552) (0.389) 

2.4 0.909 0.895 0.882 0.869 0.859 0.852 0.853 0.866 0.905 
(0.261) (0.258) (0.249) (0.233) (0.2 11) (0.182) (0.152) (0.128) (0.114) 

0.028 0.063 0.121 0.188 0.269 0.364 0.474 0.604 0.766 
(0.4 72) (0.584) (0.661) (0. 698) (0.694) (0. 651) (0.575) (0.469) (0.335) 

3.0 0.795 0.787 0.781 0.779 0.781 0.789 0.807 0.838 0.893 
(0.3 20) (0.293) (0.266) (0 . 239) (0 .2 12) (0. 188) (0. 167) (0. 150) (0.131) 

0.042 0.095 0.156 0.227 0.307 0.398 0.502 0.623 0.773 
(0.661) (0.706) (0.719) (0.704) (0.662) (0.599) (0.517) (0.420) (0.305) 

3.6 0.649 0.659 0.673 0.690 0.712 0.739 0.775 0.823 0.890 
(0.321) (0. 301) (0. 282) (0.264) (0.246) (0.227) (0 . 206) (0. 181) (0. 146) 

0.057 0.119 0.187 0.260 0.339 0.426 0.524 0.638 0.780 
(0.831) (0.805) (0.763) (0. 708) (0.642) (0 . 568) (0 . 486 ) (0 . 39 7) (0. 294 ) 

4.2 0.543 0.575 0.608 0.643 0.681 0.723 0.771 0.827 0.898 
(0.447) (0.415) (0.383) (0. 351) (0 . 318) (0.284) (0.248) (0.207) (0.155) 

0.068 0.137 0.208 0.283 0.362 0.447 0.541 0.650 0.786 
(0.959) (0.880) (0.802) (0.724) (0.647) (0.569) (0.488) (0.403) (0.303) 

4.8 0.581 0.614 0.647 0.683 0.720 0.761 0.805 0.856 0.918 
(0.509) (0.463) (0.420) (0.379) (0.339) (0.300) (0.258) (0.212) (0. 152) 

0.071 0.144 0.217 0.293 0.373 0.458 0.551 0.658 0.791 
(1.017) (0.935) (0.855) (0 . 778) (0 . 702) (0.625) (0.544) (0.454) (0. 343) 

5.4 0.798 0.803 0.811 0.823 0.839 0.859 0.883 0.914 0.952 
(0.344) (0. 318) (0.295) (0.274) (0.255) (0.234) (0. 210) (0. 177) (0.126) 

0.062 0.130 0.204 0.283 0.367 0.457 0.554 0.663 0.795 
(0.9 67) (0.996) (0.990) (0.957) (0.903) (0.829) (0.736) (0.617) (0.462) 

For the mastery score - 2 enter N-xbar in the test mean column 
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RELIABILITY IN MASTERY TESTING 



Table of the Raw Agreement Index and ita 

S.E.*SQRT(M), the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M - Number of subjects 
Number of items N » 6 
Mastery score C - 6 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 

0.6 1.000 1.000 1.000 0.999 0.999 0.997 0.993 0.986 "(K978~ 
(0.000) (0.001) (0.003) (0.007) (0.014) (0.028) (0.049) (0.070) (0.063) 

0.000 0.001 0.007 0.022 0.056 0.121 0.231 0.399 0.644 
(0.005) (0.035) (0.119) (0.275) (0.503) (0.771) (1.010) (1.109) (0.9 18) 

1.2 1.000 0.999 0.998 0.997 0.994 0.988 0.979 0.965 0.953 
(0.004) (0.008) (0.015) (0.026) (0.042) (0.065) (0.091) (0. 105) (0.001) 
0.001 0.004 0.014 0.038 0.082 0.156 0.270 0.434 0.663 
(0.022) (0.078) (0.182) (0.332) (0.509) (0.680) (0. 797) (0. 801) (f .628) 

1.8 0.997 0.996 0.993 0.988 0.981 0.970 0.955 0.937 0.929 
(0.022) (0.032) (0.047) (0.066) (0.089) (0.1 13) (0. 131) (0. 127) (0.088) 

0.002 0.010 0.027 0.060 0.113 0.195 0.311 0.469 0.681 
(0.063) (0.148) (0.268) (0. 409 ) (0 . 548) (0.656) (0.703) (0. 658) (0.49 6) 

2.4 0.988 0.983 0.976 0.967 0.954 0.939 0.920 0.903 0.905 

(0.068) (0.086) (0.106) (0.128) (0.148) (0.162) (0.161) (0.135) (0.094) 
0.006 0.021 0.047 0.089 0.151 0.238 0.353 0.503 0.698 
(0.137) (0.245) (0.368) (0.488) (0 . 586) (0. 643) (0.641) (0. 567) (0 .4 1 8) 

3 *° ,!! , ?^ 1 w 0,951 v 0,939 °* 925 0 ' 908 °- 890 0-874 0.866 0.885 
(0.154) (0.172) (0.188) (0.200) (0.203) (0.195) (0.171) (0. 129) (0. 106) 

,2*?Jtw 0,037 0 * 073 °' 125 °' 194 °- 283 0.395 0.535 0.715 
(0.253) (0.366) (0.474) (0.561) (0.616) (0.628) (0.591) (0. 503) (0.368) 

3.6 0.898 0.884 0.869 0.854 0.839 0.827 0.822 0.831 0.873 
(0.263) (0.265) (0.260) (0.248) (0.22 7) (0.196) (0. 159) (0. 130) (0 . 1 3 1) 
0.024 0.059 0.106 0.166 0.240 0.330 0.437 0.567 0.730 
(0.410) (0.505) (0.579) (0.625) (0.637) (0.6 13) (0.552) (0.458) (0.338) 

4.2 0.781 0.770 0.762 0.756 0.755 0.760 0.776 0.809 0.872 
(0.323) (0.297) (0.269) (0.239) (0.209) (0.184) (0.169) (0.166) (0.163) 
0.039 0.087 0.144 0.211 0.288 0.377 0.478 0.597 0.745 
(0.606) (0.658) (0.684) (0.683) (0.656) (0.604) (0.528) (0.433) (0.32 7) 

4.8 0.620 0.630 0.644 0.662 0.687 0.718 0.759 0.814 0.889 
(0.297) (0.285) (0.277! ( 0. 2 72) (0 . 268) (0 . 264) (0 . 2 54) (0. 235) (0 . 190) 

,°* 118 v °* 185 0- 258 0- 337 0-423 0.517 0.625 0.758 
(0.836) (0.825) (0.797) (0.751) (0.691) (0.618) (0.534) (0.441) (0.343) 

5.4 0.542 0.583 0.625 0.668 0.714 0.761 0.812 0.867 0.928 
(0.596) (0.570) (0.538) (0.500) (0.45 7) (0.408) (0.349) (0.279) (0.188) 
0.076 0.151 0.228 0.305 0.385 0.467 0.554 0.651 0.771 
11*> (1.047j (0.974) (0.895) (0.812) (0.724) (0.631) (0. 532) (0. 428) 

For the mastery score - 1 enttr N-xbar in the test meln'column 
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Table of the Raw Agreement Index and its 

S.E.*SQRT(M) , the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M ■ Number of subjects 
Number of items N ■ 7 
Mastery score C - 4 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



0.7 0.990 0.985 0.978 0.970 0.961 0.953 0.949 0.951 0.964 
(0.081) (0.104) (0.123)^0.136) (0. 139) (0 . 13*) (0. 1 13) (0 . 09 1) (0 . 076) 
0.011 0.039 0.087 0.156 0.244 0.349 0.471 0.610 0.775 
(0.274) (0.516) (0.7 38) (0.901) (0. 986 ) (0 . 992) (0. 9 19) (0. 772) (0.541) 

1.4 0.923 0.911 0.900 0.890 0.883 0.881 0.886 0.901 0.934 
(0.251) (0.247) CO. 235) (0.217) (0 . 195) (0 . 1 69) (0. 145) (0 . 124) (0 . 103) 

0.031 0.077 0.136 0.209 0.294 0.391 0.500 0.626 0 779 
(0.537) (0.675) (0.760) (0.793) (0.780) (0.728) (0.644) (0.529) (0.376) 

2.1 0.775 0.772 0.774 0.779 0.788 0.804 0.826 0.860 0.911 
(0.323) (0.296) (0.270) (0.245) (0. 221) (0. 199) (0. 176) (0. 152) (0.121) 

0.050 0.109 0.176 0.250 0.331 0.420 0.521 0.637 0.782 
(0.758) (0.779) (0.7 63) (0.733) (0.678) (0.607) (0. 524) (0. 428) (0.309) 

2.8 0.603 0.630 0.654 0.680 0.710 0.744 0.784 0.832 0.897 
(0.387) (0.359) (0.331) (0.302) (0. 272) (0 . 24 1) (0. 209) (0. 1 74) (0 . 131) 

0.064 0.130 0.200 0.274 0.353 0.438 0.533 0.643 0.784 
(0.897) (0.835) (0.7 68) (0.697) (0.623) (0.546) (0. 466) (0. 379) (0. 278) 

3.5 0.534 0.569 0.604 0.641 0.680 0.722 0.768 0.823 0.892 
(0.472) (0.426) (0.383) (0.342) (0. 303) (0. 263) (0. 224) (0. 182) (0. 134) 

0.008 0.138 0.209 0.282 0.360 0.443 0.537 0.645 0.784 
(0.945) (0.853) (0.7 67) (0.685) (0.605) (0.527) (0.448) (0.365) (0.269) 

4.2 0.608 0.630 0.654 0.680 0.710 0.744 0.784 0.832 0.897 
(0.387) (0.359) (0.331) (0.302) (0.272) (0.241) (0. 209) (0. 174) (0.131) 

0.064 0.130 0.200 0.274 0.353 0.438 0.533 0.643 0.784 
(0.897) (0.835) (0.7 68) (0.697) (0. 623) (0 . 546) (0. 466) (0.3/9) (0. 278) 

4.9 0.775 0.772 0.774 0.?79 0.788 0.804 0.826 0.860 0.911 
(0.323) (0.296) (0.270) (0.245) (0. 221) (0 . 199) (0. 1 76) (0. 1J2) (0 . 1 21) 

0.050 0.109 0.176 0.250 0.331 0.420 0.521 0.637 0.782 
(0.758) (0.779) (0.7 68) (0.733) (0.678) (0.607) (0.524) (0.428) (0.309) 

5.6 0.923 0.911 0,900 0.890 0.883 0.881 0.886 0.901 0.934 
(0.251) (0.24 7) (0.235) (0.217) (0.195) (0 . 1 69) (0. 145) (0 . 124) (0.103) 

0.031 0.077 0.136 0.209 0.294 0.391 0.500 0.626 0.779 
(0.537) (0.67 5) (0.7 60) (0.793) (0.780) (0 . 72 8) (0. 644) (0 . 529) (0 . 376) 

6.3 0.990 0.985 0.978 0.970 0.961 0.953 0.949 0.951 0.964 
(0.031> (0.104) (0.123) (0.136) (0. 139) (0. 131) (0. 113) (0.091) (0.076) 

0.011 0.039 0.087 0.156 0.244 0.349 0.471 0.610 n.775 
(0.274) (0.516) (0.7 38) CO. 901) (0.986) (0.992) (0.919) (0.772) (0.5; 1) 
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RELIABILITY IN MASTERY TESTING 



Table of the Raw Agreement Index and its 
S-S.E.^SQflEOO, the Kappa Index and ltg 

S.E.*SQRT(M) in the Beta-binomial Model 

- -'<• o/ M "^ Nunber of subjects 

-. ^ >Sp ^Uttt»,er of items N - -7 V " • * ' ' 
" ' <..,.; '.Mastery score ? C - n 3 c - 



Test KR21-* ! * -•• •• ' - 

-!!l..:^L..:^°„_:f°l^ ,A0 ° '^• 50d ,' ;600 - 700 - 800 - 9o ° 
• ^"25^ ^'Sfr c S-2Ji > ^-SSS 1 co.osoj C o.o95> co. 098) cS: SSS> C S I S«> 

• - - ■ -.9.003 0.0 4 .0.041 0.092 0. 168 0.272 0 403 0 5fi1 n 7*i 
, . C0.082>(0.249)(0:4 ? 9>t0.72l>^ 

1.4 0.986 0.979 0.971 0.961 0.949 9;938 0 929 0 92(1 n on 

' S'S?; > Sl-^ Hl-i???^ 1 J2 0> <0 - 156 > c2: "*> <o:«S><S:o84, 

fX'SJiwX'^wS'KZw 0,138 0;220 0;321 °- 443 0i586 °-76o 

(0.237) (.0.417) (0.588) (0.719) (0.791) (0.796) (0. 736) (0. 613) (0.427) 

2.1 f^ , J5w^?J?! , /S:i" ; ^'"^ 0,884 0-.376 0.875 '0.886; 0.918 

rn^JwX'Kiw 0,118 ' 0,186 0,268 0; 365 °' 476 °- 6 0> 0-767 
. . (0.443) (.9.577) (0. 672) (0. 719) (0.719) (0. 677) (0.597) (0. 486) (0. 342) 

2,3 rX , ?}JwX- ; fi?w'° ; ' ? " M ° ,;5 " a 0; 799 °'- S07 0.323 ' 0.851 0.901 

rS'S^wJ'SJcw 0,157 ' °'- 228 °- 309 °- 400 O- 50 ^ ^'623 0.774 
. . , (0,653) (0.705) W..721) (0. 706) (0. 663) (0.598) (0.515) (0.416) (0. 297) 

3.5 0.657 0. 663" 0*.;6 82- 0'.59 9 ' ;o. 721 0.748 0-.783 ' 6.828 0 892 
{ n C X'?!i> ^•287) ) CX)-.266) i C0.244) (0.221) (6vl.9<>) (0. 167) (0 131) 

0.057 0.120 0. 188 0.261 0.339 • 0'. 426 0.523 6*35 0 77R 
. (0.326) (0.795) <4>.7A9) (0.692) (0.624) (O^S) (oltJi) 'Sim) 

4.2 0.544 6. 575 ' 0-..609 ' 0.643 ! - 0.681 1 0.-722 0-.767 0- 82"* O 892 

/S'JSJx , ' 36 0,206 °- 280 0.357 0.441 0.535 b.64'4 0.78'' 
(0.93^) (0.848) (0. 768) (0.689) (0.611) (0.533) (0. 454) (0.370) (0. 274) 

4,9 rS'fJJwJ'^wJ^ 34 0,668 0;703 °"- 742 0.736 : 0.837 '0.902 
J } ( n" v55 } ?- 376 > CO. 338) (0.302) (0.265) (0.227) (0. 137) (0. 137) 

rJ'J2wM2w ff ' 209 0,283 0,361 0,446 0.539 0.648 0 785 
(0.948) (0,86.7) (0.738) (0. 710) (0.634) (0.557) (0.478) (0.394) (0. 292) 

5.6 ,«'ltL ,°* 75 \ 0fc7b2 °' 773 °' 789 0-311 0.838 0.874 0.924 
{ n nlV ( S' 3 ^ ) (0< r 288 > C0..264) (0.241) (0.218) (0. 194) (0. 166) (0. 126) 

0.057 0.121 0.191 0.267 0.348 0.437 0.535 0.647 0 786 
(0.851) (0.849) 10. 823) (0. 777) (0. 717) (0. 646) (oJol) (0.466) (O.JJs) 

6.3 ,° n 'V B ,, 0,927 °* 918 0,911 °- 908 0.-909 0.916 0.O31 0 957 

^^'J/ 2 ^^ 22 ^ 0,20 ^ 
0.034 0.084 0.149 0.227 0.315 0.412 0.520 0 642 0 787 

- — -^ifi^f":!!^ C0 * 90l3) C°-9^8) C0.941) (0.889)(0.797) (0.*665) (0^479) 
For the mastery . score 3 ; enter N-xb Ir~In~^ he'tes t'meln" co 1 umn 
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Table of the Raw Agreement Index and its 

S.E.*SQRT(K) , the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M - Number of subjects 
Number of items N • 7 
Mastery score C ■ 6 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



0.7 1.000 1.000 0.999 0.998 0.997 0.993 0.987 0.979 0.975 
(0.001) (0.003) (0.008) (0.016) (0.030) (O.OO) (0.069) (0.077) (0.057) 

0.000 0.003 0.015 0.042 0.096 0.183 0.311 0.484 0.706 
(0.015) (0.081) (0.226) (0.446) (0.703) (0 . 934) ( 1 . 064) (1 .021) (0.747) 

1.4 0.998 0.997 0.994 0.990 0.984 0.975 0.963 0.951 0.949 
(0.016) (0.027) (0.042) (0.061) (0.083) (0 . 10 4) (0 . 1 15) (0.105) (0.074) 

0.002 0.011 0.032 0.072 0.136 0.230 0.356 0.518 0.721 
(0.064) (0.175) (0.334) (0.515) (0.678) (0. 785) (0.804) (0.715) (0.506) 

2.1 0.989 0.983 0.976 0.966 0.95A 0.940 0.925 0.91C 0.925 
(0.072) (0.092) (0. 113) (0.133) (0. 149) (0 . 155) (0. 1 46) (0 . 1 17) (0. 086) 

0.008 0.025 0.058 0.109 0.182 0.278 0.399 0.54" 0.734 
(0. 166) (0.305) (0.455) (0.588) (0.680) (0 . 712) (0. 676) (0. 571) (0.399) 

2.3 0.953 0.942 0.929 0.915 0.900 0.887 0.877 0.878 0.904 
(0. 181) (0. 196) (0.205) (0.208) (0.201) (0. 183) (0. 155) (0 . 120) (0. 100) 
0.018 0.047 0.092 0.152 0.229 0.324 0.439 0.575 0.746 
(0.322) (0.454) (0 . 565) (0 . 64 1) (0.672) (0 . 654) (0 . 589) (0 . 482) (0 . 338) 

3.5 0.869 0.356 0.843 0.832 0.824 0.8:1 0.826 0.844 0.890 
(0.292) (0.231) (0.264) (0.241) (0.214) (0 . 134) (0 . 1 55) (0 . 132) (0 . 1 17) 

0.032 0.075 0.130 0.196 0.275 0.367 0.474 0.599 0.756 
(0.513) (0.599) (0.652) (0.670) (0.653) (0. 604) (0.526) (0 . 424) (0. 302) 

4.2 0.728 0.726 0.727 0.731 0.741 0.757 0.783 0.321 0.884 
(0.315) (0.287) (0. 262) (0.238) (0.217) (0 . 19 7) (0 . 1 79) (0 . 16 1) (0 . 1 36) 

0.048 0.103 0.166 0.237 0.316 0.404 0.504 0.620 0.766 
(0.712) (0.727) (0.717) (0.685) (0.634) (0 . 566) ( 0 . 485) (0.392) (0.286) 

4.9 0.578 0.600 0.625 0. S3 0.684 0.721 0.765 0.818 0.889 
(0.362) (0.344) (0 . 325) (0 .304) (0 . 282) (0. 257) (0.229) (0 . 196) (0 . 150) 
0.062 0.128 0.197 0.270 0.348 0.433 0.527 0.636 0.774 
(0.884) (0.829) (0. 767) (0.700) (0.629) (0. 554) (0.474) (0 . 388) (0. 290) 

5.6 0.548 0.584 0.621 0.659 0.699 0.742 0.789 0.843 0.909 
(0.513) (0.467) (0.423) (0.382) (0.341) (0. 301) (0.259) (0. 213) (0. 153) 

0.071 0.142 0.215 0.289 0.368 0.451 0.543 0.649 0.781 
(0.990) (0.904) (0. 822) (0.744) (0.668) (0. 592) (0 . 5 13) (0 . 427) (0 . 323) 

6.3 0.753 0.764 0.777 0.794 0.815 0.839 0.368 0.903 0.946 
(0.376) (0.349) (0.324) (0.300) (0.276) (0. 251) (0 .222) (0 . 185) (0 . 131) 

0.065 0.135 0.209 0.286 0.368 0.456 0.551 0.657 0.786 
(0.977) (0.972) (0 . 944 ) (0 . 900) (0. 841) (0. 769) (0.681) (0. 573) (0.433) 

For the mastery score - 2 enter N-xbar in the test mean column 
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Table of the Raw Agreement Index end ite 

S.E.*SQRT(M), the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M - Number of subjects 
Number of Items N ■ 7 
Mastery score C - 7 



Test KR21- 

,20 ° ,30 ° ,40 ° - 50C - 600 - 700 - 800 -900 

0,7 ,n*2!!!!w 1,000 1 - 000 1 - 000 ^O 00 °- 999 ™996H>799o""oT98l" 
(0. 000) (0. 000) (0. 001) (0. 002) (0. 006) (0. 01'.) (0.031) (0.057) (0.064) 
O-OOO 0.000 0.003 0.012 0.036 0.088 0.184 0.347 0.604 
(0.001) (0.014) (0.060) (0.168) (0.356) (0 . 61 6) (0. 893) (1 . 068) (0. 940) 

1.4 , 1 ' 000 °- 999 °- 999 0.997 0.994 0.987 0.974 0.958 
^'S?^ (°- 003 > (0.006) (0.011) (0.022) (0.040) (0.066) (0.093) (0.084) 

/S'SSSwS' 002 0,007 0 * 023 °- 056 °' 118 O- 2 " 0.386 0.627 
(0.008) (0.038) (0.107) (0.227) (0.392) (0.578) (0.736) (0.790) (0.644) 

2.1 rS'JSwS'Kfw 0,997 0,994 °- 990 °' 982 °' 969 0.950 0.934 
(0.009) (0.014) (0.023) (0.036) (0.055) (0.080) (0.108) (0.122) (0.091) 
,?/2?nw 0,005 °- 016 °- 040 0 ' 083 O- 155 0.265 0.425 0.649 
(0.030) (0.085) (0.179) (0.307) (0.453) (0.5^8) (0.672) (0.660) (0.508) 

2,3 /S'nSwJ'Siw 0,988 0,982 °* 972 °' 959 °- 940 °' 919 0.909 
( ^2^ )(0,048)(0 • 064)(0 • 085) ( 0 • 1 0 9 )(0• , 32 > (0-148) (0.139) (0.092) 

/!'!!?w!!'?Hx 0,031 0,064 0a18 °' 198 0- 311 °«* 6 * 0.670 
(0. 078) (0. 162) (0. 272) (0. 396) (0. 514) f'o. 600) (0.628) (0.574) (0.425) 

3.5 ^'IIL,^ 972 0,963 °' 951 °' 936 0- 918 0.898 0.880 0.886 
( S , J 98)( ?- n ;j ) (°- 137 )(0- 1 57) (0.173) (0.181) (0.172) (0.138) (0.096) 

/S , ?!!w° ,0 i 5 , 0,054 0,098 °- 160 °' 246 °' 358 0.502 oW 
(0.168) (0.271) (0.382) (0.486) (0.566) (0.604) (0.589) (0.510) (0.370) 

4.2 ,n'Ul,,°' 922 °- 908 °' 892 0.875 0.857 0.844 0.841 0.870 
( n , n?n )( J , nJ? )(0,227)(0,229)(0 - 222 >(°-203)(0.170)(0.128)(0.114> 

0.018 0.045 0.085 0.139 0.209 0.297 0.406 0.539 0 709 
(0.308) (0.410) (0.500) (0 . 568) (0 . 604) (0 . 600) (0 . 552 ) (0.461) (o!335) 

4,9 ,«'UL, 0,821 0,308 °* 796 °' 787 0- 783 0- 788 0.810 0.865 
( ? ,30 ? )(0 - 2 " ) ( 0 - 27 5) (0.250) (0.219) (0. 186) (0.157) (0. 144) (0. 148) 
0.032 0.073 0.124 0.187 0.262 0.350 0.453 0.575 0 728 
(0.501) (0.5 72) (0.620) (0.641) (0.632) (0 . 593) (0. 524M0. 430) (0.318) 

5.6 /a'SSIwS' 668 0,673 °* 683 °- 699 0.722 0.755 0.804 0.878 
^ 2 ?^ ^ 271) < 0 - 251 >( 0 - 237 >(0. 228) (0.224) (0.22n^ 

/X'SJJwi'JSf* 0,170 °- 240 °' 318 °' 404 °-* 9 * 0.609 0.745 
(0.743) (0.754) (0.744) (0.715) (0.666) (0.601) (0.521) (0.428) (0.323) 

6.3 /J*c 3 t ° ,57 ?w v °* 653 °- 697 0. 744 0.796 0.853 0.919 
( ?^ 17) (0- 5 y^(°- 48 5) (0. 459) (0.428) (0.389) (0.341) (0.278) (0.193) 

"•J", °' 145 0.219 0. 295 0.374 0.456 0.543 0.641 0.761 
(1^043) (0.985) (0. 920) (0.848) (0.770) (0.687) (0.599) (0.504) (0.402) 

For the mastery score - 1 enter N-xbar in the tYs t~m«n~column 
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Table of the Raw Agreement Index ind its 

S.E.*SQRT(M) , the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M » Number of subjects 
Number of items N - 8 
M»ist*ry score C ■ 4 



Test KR21- 
Mean . 100 



.200 



.300 



.400 



.500 



.600 



.700 



.800 



.900 



0.8 0.984 0.977 0.968 0.959 
(0.112) (0.133) (0.149) (0.155 
0.015 0.043 0.100 0.171 
(0.334) (0.568) (0.763) (0.892 

1.6 0.881 0.871 0.862 0.856 
(0.290) (0.273) (0.251) (0.227 
0.039 0.090 0.153 0.227 
(0.627) (0.724) (0.770) (0.7 73 

2.4 0.693 0.703 0.715 0.731 
(0.342) (0.317) (0.293) (0.268 

0.058 0.122 0.190 0.264 
(0.833) (0.807) (0.765) (0.709 

3.2 0.549 0.581 0*615 0.649 
(0.451) (0.409) (0.369) (0.331 
0.067 0.136 0.206 0.279 
(0.923) (0.838) (0. 756) (0. 677 

4.0 0.564 0.592 0.622 0.653 
(0.414) (0.381) (0.348) (0.315 
0.065 0.133 0.202 0.275 
(0.901) (0.825) (0.749) (0.673 

4.5 0.714 0.717 0.724 0.735 
(0.324) (0.299) (0.275) (0.252 

0.054 0.114 0.180 0.253 
(0.7 77) (0.769) (0.739) (0.691 

5.6 0.878 0.866 0.855 0.847 
(0.290) (0.275) (0.255) (0.232 

0.035 0.083 0.143 0.215 
(0.5 72) (0.665) (0 . 7 1 3) (0 . 7 20 

6.4 0.971 0.962 0.951 0.939 
(0. 147)(0.165) (0. 177)(0.181 
0.017 0.049 0.098 0.164 
(0.330) (0.507) (0.652) (0.745 

7.2 0.998 0.996 0.992 0.987 
(0.025) (0.040) (0.05 9) (0.080 
0.004 0.019 0.053 0.109 
(0.119) (0.312) (0.548) (0.76 7 



0.950 0.943 0.940 0.944 0.961 
(0.15 2) (0.1 j 3) (0.118) (0.09 7 

0.259 0.363 0.431 0.615 
(0.947) (0.931) (0.852) (0.712 

0.854 0.858 0.869 0.890 
(0.202) (0.177) (0.154) (0.133 

0.311 0.404 0.509 0.629 
(0.741) (0.680) (0.595) (0.488 

0.751 0.776 0.807 0.848 
(0.244) (0.218) (0.191) (0.161 

0.343 0.429 0.525 0.637 
(0.643) (0.5 70) (0.488) (0.398 

0.686 0.726 0.771 0.824 
(0.293) (0.256) (0.217) (0. 177 

0.356 0.439 0.532 0.640 
(0.600) (0.522) (0.444) (0.360 

0.688 0.726 0.769 0.821 
(0.281) (0.24 7) (0.212) (0.173 

0.352 0.436 0.529 0.637 
(0.5 97) (0.520) (0.440) (0.356 

0.751 0.771 0.799 0.833 
(0.229) (0.206) (0.181) (0.154 

0.332 0.419 0.516 0.630 
(0.630) (0.55 7) (0.47 4) (0.382 

0.843 0.844 0.852 0.872 
(0.206) (0.179) (0.153) (0.130 

0.297 0.309 0.495 0.617 
(0.691) (0.631) (0.54 7) (0.442 

0.928 0.918 0.912 0.915 
(0.176)(0.161) (0.137)(0.109 

0.248 0.348 0.464 0.600 
(0.7 78) (0.753) (0.6 78) (0.55 7 

0.981 0.973 0.965 0.961 
(0.098) (0.108) (0.104) (0.085 

0.191 0.296 0.425 0.576 
(0.924) (0.990) (0.955) (0.817 



For the mastery score - 5 enter N-xbar in the test mean column 
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RELIABILITY IN MASTERY TESTING 

Table of the Raw Agreement Index and its 

S.E.*SQRT(M) , the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M ■ Number of subjects 
Number of items N - 3 
Mastery score C - 5 



Test KR21- 

Mean .100 .200 .300 .400 .50C .600 .700 .800 



0.8 0.998 0.996 0.992 0.987 0.981 0.973 0.965 0.961 0.967 
(0.025) (0.040) (0.059) (0.080) (0.098) (0.108) (0.104) (0.085) (0.065) 
0.00A 0.019 0.053 0.109 0.191 0.296 0.425 0.576 0.756 
(0.119) (0.312) (0.548) (0 . 76 7) (0. 92 4) (0.990) (0.955) (0.817) (0.568) 

1.6 0.971 0.962 0.951 0.939 0.928 0.918 0.912 0.915 0.937 
(0.147) (0.165) (0.177) (0.181) (0.176) (0.161) (0.137) (0.109) (0.0t8) 
0.017 0.049 0.098 0.164 0.248 *.348 0.464 0.600 0.764 
(0.330) (0.507) (0.652) (0.745) (0.778) (0.753) (0.678)^0.557) (0.388) 

2.4 0.878 0.866 0.855 0.847 0.843 0.844 0.852 0.872 0.913 
(0.290) (0.275) (0.255) (0.232) (0.206) (0.179) (0.153) (0 . 130) (0. 107) 
0.035 0.083 0.143 0.--.5 0.297 0.389 0.495 0.617 0.770 
(0.572) (0.665) (0.713) (0.720) (0.691) (0.631) (0.547) (0.442) (0.313) 

3.2 0.714 0.717 0.724 0.735 0.751 0.771 0.799 0.838 0.896 
(C.324) (0.299) (0.275) (0.252) (0.229) (0.206) (0 . 181 ) (0 . 15 4) ( 0. 1 20) 
0.054 0.114 0.180 0.253 0.332 0.419 0.516 0.630 0.774 
(0.777) (0.769) (0.739) (0.691) (0.630) (0.557) (0 . 474) (0 . 382) (0.275) 

4.0 0.564 0.592 0.622 0.653 0.688 0.726 0.769 0.821 0.889 
(0.414) (0.381) (0.34S) (0.315) (0.281) (0.247) (0.212) (0.173) (0.128) 
0.065 0.133 0.202 0.275 0.352 0.436 0.529 0.637 0.777 
(0.901) (0.825) (0.749) (0 . 67 3) (0 . 59 7) (0 . 520> (<V, 440) (0 . 35 6) (0 . 2 60) 

4.8 0.549 0.581 0.615 0.649 0.686 0.726 0.771 0.824 0.892 
(0.451) (0.409) (0.369) (0 . 33 1 ) (0 . 293) (0.256) (0.217) (0 . 17 7) (0 . 1 30) 
0.067 0.136 0.206 0.279 0.356 0.439 0.532 0.640 0.778 
(0.923) (0.838) (0.756) (0 . 67 7) (0 . 600) (0.522) (0.444) (0. 360) (0.264) 

5.6 0.693 C.703 0.715 0.731 0.751 0.776 0.807 0.8i8 0.905 
(0.342) (0.317) (0.293) (0.268) (0.244) (0 . 2 18) (0. 1 9 1) (0 . 1 6 1 ) ( 0 . 1 2 3) 
0.058 0.122 0.190 0.264 0.343 0.429 0.525 0.637 0.778 
(0.833) (0.807) (0.765) (0.709) (0.643) (0.570) (0.488) (0. 398) (0.290) 

6.4 0.381 0.871 0.862 0.856 0.854 0.858 0.869 0.890 0.928 
(0.290) (0.273) (0.251) (0.227) (n. 202) (0.177) (0. 154) (0. 133) f0. 107^ 
0.039 0.090 0.153 0.227 0.311 0.404 0.509 0.629 0.776' 
(0.627) (0.724) (0.770) (0.773) CO. 741) (0.680) (0.595) (0.488) (0.350) 

7.2 0.984 0.977 0.968 0.959 0.950 0.943 0.940 0.944 » .961 
(0.112) (0.133) (0.149) (0.155) (0.152) (0.139) (0. 118) (0.097) (^.080) 

0.015 0.048 0.100 0.171 0.259 0.363 0.481 0.C15 P 773 
(0.334) (0.568) (0.763) (0.892) (0.947) (0.931) (0.852) (0 712) (u. 302) 

For the mastery score - 4 enter N-xbar in the tist Lean column 
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Table of the Raw Agreement Index and Its 

S.E.*SQRT(M) , the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M - Number of subjects 
Number of items N ■ 8 
Mastery score C ■ 6 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



0.8 1.000 0.999 0.999 0.997 0.994 0.989 0.982 0.974 0.972 
(0.003) (0.008) (0.016) (0 . 0 28) (0. 046) (0 . ObC) (0 . 082) (0.080) (0. 057) 

0.001 0.006 0.023 0.060 0.124 0.222 0.354 0.521 0.727 
(0.029) (0.128) (0.312) (0.552) (0.791) (0.967) (1.025) (0.930) (0.656) 

1.6 0.996 0.992 0.988 0.931 0.972 0.96. 0.948 0.939 0.945 
(0.038) (0.055) (0.0 75) (0.09 7) (0.116) (0.1. 3) (0.126) (0.105) (0.075) 
0.005 0.019 0.050 0.100 0.175 0.275 0.400 0.553 C.740 
(0.121) (0.270) (0.448) (0.615) (0.737) (0.788) (0.757) (0.642) (0.444) 

2.4 0.970 0.960 0.949 0.936 0.923 0.910 0.900 0.899 0.920 
(0.143) (0.162) (0. 176) (0.184) (0. 183) (0.171) (0. 147) (0.115) (0.090) 

0.015 0.043 0.087 0.148 0.227 0.325 G.442 0.580 0.750 
(0. 286) (0.438) (0 . 572 ) (0 . 6 64) (0.705) (0.690) (0.6 22) (0.5 07) (0.350) 

3.2 0.392 0.879 0.866 0.855 0.846 0.842 0.845 0.861 0.901 
(0.27 5) (0.2 6G) (0. 254) (0 . 235) (0 . 2 10) (0 . 182) (0 . 153) (0 . 1 27) (0. 106) 

0.030 0.073 0.128 0.196 0.276 0.369 0.477 0.602 0.759 
(0.49 7) (0.5 97) (0.659) (0.682) (0.665) (0.614) (0.533) (0.4 28) (0.299) 

4.0 0.747 0.744 0.745 0.749 0.758 0.772 C.796 0.831 0.889 
(0.317) (0.290) (0.265) (0.240) (0.217) (0. 195) (0.173) (0. 150) (0. 121) 

0.043 0.103 0.167 0.238 0,317 0.405 0.504 0.620 0.767 
(0.706) (0.723) (0.713) (0.679) (0.627) (0.557) (0.475) (0.381) (0. 272) 

4.8 0.588 0.609 0.633 0.660 0.691 0.726 0.767 0.818 0.886 
(0.36 5) (0.342) (0. 313) (0.294) (0.268) (0 . 240) (0 . 2 10) (0 . 1 75) (0. 133) 

0.062 0.127 0.196 0.268 0.346 0.430 0.523 0.633 0.772 
(0.866) (0.308) (0.744) (0.67i) (0.603) (0.5 27) (0.4*7) (0.362) (0.2 65) 

5.6 0.540 0.574 0.610 0.646 0.685 0.727 0.773 0.827 0.895 
(0.476) (0.430) (0.388) (0.T47) (0.308) (0.269) (0.229) (0.187) (0. 137) 

0.069 0.138 0.209 0.232 0.359 0.442 0.534 0.641 0.777 
(0.940) (0.352) (0.769) (0.689) (0.612) (0.536) (0.458) (0.375) (0.278) 

6.4 0.C37 0.701 0.717 0.737 0.760 0.733 0.821 0.863 0.917 
(0,370) (0.343) (0.316) (0.2 89) (0. 263) (0.235) (0 . 20 6) (0 . 1 74) (0 . 129) 

0.062 0.129 0.199 0.274 0.353 0.439 0.534 0.643 0.780 
(0.089) (0.853) (0.805) (0 . 746) (0 . 680) (0 . 608) (0 . 528) (0.437) (0.323) 

7.2 0.^15 0.904 0.896 0.891 0.890 0..894 0.904 0.923 0.952 
(0.267) (0.2 53> (0. 233) (0. 2 11) (0. 133) (0. 166) (0. 148) (0. 130) (0. 102) 

0.039 0.093 0.159 0.237 0.323 0.418 0.522 0.640 0.781 
(0.668) (0.809) (0.836) (0. 008) (0.887) (0.830) (0.741) (0.619) (0.450) 



For the mastery score ■ 3 enter N-xbar in the test mean coluirn 
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RELIABILITY IN MASTERY TESTING 



Table of the Raw Agreement l:\dex and its 

S.E,*SQRT(M), the Kappa Ind< s x and its 
S.E.*SQRT(M) in tie Bete-binoaial Model 
M ■ Number of subjects 
Number of items N - 8 
Mastery score C - 7 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 

0.8 1.000 1.000 1.000 1.000 0.999 0.997 0.992 C.)85~(K977~ 
(0.000) (0.001) (0.003) (0.006) (0.014) (0 . 029) (0 . 050) (C . 068) (0 . 05 7) 

0.000 0.001 0.007 0.025 0.066 0.142 0.264 0.440 0.677 
(0.005) (0.036) (0. 129) (0. 305) (0. 551) (0 . 815) (1 . 009) (1.031) (0.780) 

1.6 1.000 0.999 0.998 0.996 0.992 0.985 0.975 0.961 0.953 
(0.005) (0.0 10) (0.019) (0.031) (0 . 050) (0 . 073) (0.096) (0 . 102) (0 . 073) 

0.001 0.005 0.018 0.048 0.101 0.187 0.311 0.478 0.695 
(0.027) (0.097) (0.222) (0.394) (0.577) (0.726) (0.792) (0 . 734) (0 . 52 7) 

2.4 0.596 0.993 0.989 0.982 0.973 0.960 0.945 0.929 0.9?8 
(0.034) (0.048) (0.066) (0.088) (0.110) (0. 129) (0. 136) (0, 119) (0.081) 

0.004 0.015 0.038 0.080 0.144 0.236 0.358 0.514 0.712 
(0.091) (0.201) (0.343) (0.493) (0.6 18) (0.690) (0.684) (0. 591) (0.412) 

3 * 2 ,°/? 7 ?w° >Q69 °' 959 °' 947 °' 932 °- 916 0.900 0.891 0.905 
(0.112) (0.133) (0.152) (0. 168X0. 177)<0. 175) (0.157) (0. 122) (0.090) 

/S'JJJw 0,032 0,068 °' 121 °' 193 0- 287 0.404 0.547 0.727 
(0.212) (0.342) (0.470) (0 . 576) (0 . 641 ) (0 . 652) (0 . 604) (0 . 499) (0 . 345) 

4 *° /J'!?!, 0,907 0,892 °' 878 °' 864 °- 8 52 0.847 0.854 0.888 
( ;*^ 7) (°- 242 >(0- 24 l) (0-232) (0.215) (0. 189) (0.156) (0. 124) (0. 106) 

, 0,058 °* 105 °' 167 °' 244 °- 336 0.446 0. 576 0.740 

(0.389) (0.499) (0.583) (0.632) (0.641) (0.610) (0.539) (0.435) (0.303) 

4.8 u.798 0.788 0.731 0.776 0,775 0.780 0.795 0.824 0.879 
( ^ 3 J 7) < 0 - 293 >(°- 266 > (0.239) (0.211) (0. 185) (0.162) (0. 144) (0.126) 

0.039 0.038 0.146 0.."14 0.292 0.381 0.483 0.602 0. 752 
(0.599) (0.650) (0,671) (0.6 63) (0.628) (0.570) (0.490) (0. 394) (0 . 282) 

5 * 6 /J'SJSwS'SIiiw 0,655 °' 673 °' 697 °' 726 0- 763 0.313 0.882 
(0.313) (0.295) (0.279) (0.2 63) (0.247) (0.229) (0.208) (0. 133) (0 . 145) 

✓ S'SJicw 0,118 °* 184 °' 256 °' 333 0- 418 0.513 0. 623 0.763 
(0.805) (0.7 77) (0.736) (0.682) (0.619) (0.547) (0.468) (0 . 380) (0 . 280) 

6.4 0.535 0.570 0.606 0.644 0.685 0.728 0.776 0.812 0.901 
(°- 482 > (0-444) (0.406) (°- 3 69) (0.332) (0.295) (0.256) (0.211) (0. 154) 

,°/2c?w 0,139 0,210 °' 284 °' 361 °- 444 0.535 0. 640 0. 772 
(0.956) (0.874) (0. 795) (0. 7 19) (0 . 645) (0.570) (0.493) (0.4 ,3) (0.309) 

7,2 /° ,7 }°w 0,727 0,746 °' 768 °' 793 °- 821 0-354 0.893 0.940 
(0.^ 10) (0.379) (0.351) (0.322) (0.294) (0.2C5) (0.233) (0. 193) (0 . 136) 
0.067 0.138 0.211 0.288 0.369 0.454 0.547 0.651 0.779 
l°:? 81 ^°- 952) (0.909) (0.855) (0.794) (0, 723) (0.640) (0.540) (0.410) 

For the mastery score - 2 enter N-xbar in the test mean column 
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Table of the R^v Agreement Index and its 

S.E.*SQRT(M), the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M ■ Number of subjects 
Number of items N ■ 8 
Mastery score C ■ & 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



0.8 1.000 1.000 1.000 1.000 1.000 0.999 0.998 0.994 0.984 
(0.000) (0.C00) (0.000) (0.001) (0.002) (0.0C) (0.019) (0.043) (0.063) 

0.000 0.000 0.001 0.007 0.023 0.063 0.147 0.302 0.566 
(0.000) (0.005) (0.030) (0.101) (0.249) (0.486) (0.780) (1.018) (0.959) 

1.6 1.000 1.000 1.000 1.000 0.999 0.997 0.992 0.982 0.963 
(0.000) (0.001) (0.002) (0.005) (0.011) (0.023) (0.046) (0.0 78) (0.086) 
0.000 0.001 0.004 0.014 0.038 0.089 0.184 0.343 0.593 
(0.003) (0.018^ (0.062) (0 . 1 5 2) (0 . 29 7) (0 . 4 85) (0 . 67 1 ) (0 . 7 72) (0.660) 

2.4 1.000 0.999 0.999 0.997 0.995 0.990 0.980 0.962 0.940 
(0.003) (0.006) (0.011) (0.019) (0.033) (0.0 55) (0.084) (0.111) (0.096) 
0.000 0.003 0.01J 0.026 C 060 0.123 0.226 0.385 0.619 
(0.014) (0.048) (0. 117) (0.22 6) (C 3 68) (0.519) (0.636) (0.658) (0.521) 

3.2 0.998 0.996 0.994 0.990 0.983 0.973 0.956 0.933 0.914 
(0.017) (0.025) (0.03 7) (0.054) (0 . 0 76) (0 . 1 0 3) (0.128) (0.137) (0.096) 

0.002 0.007 0.020 0.046 0.091 0.164 0.273 0.427 0.644 
(0.044) (0.105) (0.19 8) (0.317) (0.444) (0.554) (0 . 61 1) (0 . 581) (0. 435) 

4.0 0.989 0.984 0.978 0.969 0.957 0.940 0.918 0,895 0.889 
(0.060) (0.076) (0.096) (0. 118) (0.140) (0.159) (0.16 6) (0.145) (0.093) 

0.005 0.017 0.039 0.076 0.132 0.212 0.323 0.471 0.668 
(0.110) (0.199) (0.305) (0.416) (0.514) (0.5 77) (0.585) (0.519) (0.376) 

4.o 0.959 0.949 0.936 0.922 0.904 0.884 0.865 0.853 0.869 
(0. 152) (0. 170) (0. 18 7) (0.200) (0.206) (0.2 00) (0. 177) (0. 134) (0.10 2) 

0.013 0.035 0.068 0.116 0.181 0.267 0.376 0.513 0.691 
(0.230) (0.331) (0.429) (0 . 5 1 3) ( 0 . 5 70) (0.586) (0.554) (0.468) (0.335) 

5.6 P. 378 0.863 0.848 0.833 0.813 0.807 0.803 0.814 0.859 
(0.277) (0.276) (0.268) (C. 2 52) (0.228) (0.196) (0.1 59) (0.1 31) (0.133) 

0.025 0.061 0.107 0.166 0.238 0.326 0.430 0.555 0.712 
(0.413) (0.497) (0.562) (0.601) (0.610) (0.585) (0.52 5) (0.431) (0.313) 

6.4 0. 7 13 n 708 0.706 0.703 0.715 0.730 0.756 0.798 0.8fa9 
(0.310) (0. 231) (0 .25 3) (0.2 28) (0. 209) (0. 197) (0. 194) (0. 193) (0. 17 6) 

0.044 0.096 0.156 0.224 0.300 0.386 0.483 0.594 0.733 
(0.661) (0.691) (0.699) (0.684) (0.647) (0. 590) (0 . 5 13) (0 . 420) (0 . 3 1 7) 

7.2 0.539 0.571 0.606 0.643 0.685 0.731 0.782 0.841 0.911 
(0.444) (0.440) (0.431) (0.417) (0.39 6) (0.367) (0.329) (0.2 75) (0.195) 

0.068 0.138 0.211 0.286 0.364 0.446 0.534 0.631 0.752 
(0.981) (0.9 34) (0.87 7) (0.8 ) (0 . 739) (0 . 660) (0 . 575) (0 . 482) (0.382) 



For the mastery score - 1 enter N-xbar in the test mean column 
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RELIABILITY IN MASTERY TESTING 



Table of the Raw Agreement Index and its 

S.E.*SQRT(M), the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M - Number of subjects 
Number of items N ■ 9 
Mastery score C - 5 



Test KR21- 

_!!!__. * 20 ° ,30 ° ,400 * 500 * 600 - 700 -800 .900 

0,9 ,!'Sw 0,993 0,988 °' 982 °' 974 0.966""o"958~"o~95r"o~96r 
( ?'"9) (0.058) (0.079) (0.099) (0. 1 14) (0 . 1 19) (0 . 1 10) (0 . 088) (0 . 068) 

fJ'?!Sw! - !«% 0,063 0,124 0,208 0,314 0,440 °' 585 0-758 
(0. 159) (0.367) (0.597) (0.791) (0 . 9 14) (0 . 9 49) (0 . 896 ) (0 . 758) (0 . 528) 

1,8 ^!;L n 0, "? w °' 927 0,915 °' 905 0.898 0.896 0.905 0.932 
< ^J? 9)( 0- 2 ?9)(0.2\0)(0.204)(0. 189)(0.167)(0. 140) (0. 114) (0.093) 

rX*??JwJ*!Siw °* Ub 0,185 0,269 0,367 °' 479 °' 607 0. 764 
(0.413) (0.577) (0.691) (0.750) (0. 755) (0.712) (0.631) (0.515) (0.361) 

2,7 0,805 0,801 0,801 O- 805 0.814 0.831 0.860 0.907 

(0.315) (0.290) (0.264) (0.238) (0.213) (0.188) (0.164) (0.139) (0.110) 

/£ - 2J?wJ-!J!, 0, 163 0,235 0,316 0,405 0.506 0.622 0. 769 
(0. 676) (0. 722) (0. 730) (0. 708) (0. 660) (0.S93) (0.509) (0.411) (0.293) 

3,6 /J*S35w2* 643 0,663 0- 687 °' 714 0.745 0.782 0.828 0.892 
(0.362) (0.336) (0.311) (0.234) (0. 257) (0.228) (0.197) (0. 163) (0. 121) 

/J'SSJw 0,126 0, 194 0,267 0,344 °' 428 0- 522 0. 631 0. 771 
(0.852) (0.300) (0.740) (0.674) (0.603) (0.527) (0.447) (0.360) (0.260) 

4,5 /J* 53 ^, 0,568 0,603 0,639 °' 677 0- 718 0.764 0.817 0.886 
^•"J^ 0 - 4 ") (0.370) (0.331) (0.292) (0.253) (0.214) (0. 172) (0. 125) 
0.067 u.136 0.205 0.278 0.354 0.436 0.527 0 634 0 772 

(0.913)(0.824)(0.741)(0.661)(0.583)(0.506)(0.428)(0.*345)(0.*251) 

5 * 4 /X*«Jw!*SiJ, 0,663 0,687 0, 714 0,745 O- 782 0.828 0.892 
< 2 ,362)( S ,336)(0 - 311 H0.284)(0.257)(0.223)(0.197)( U . 163) (0.121) 

0.061 0.126 0.194 0.267 0.344 0.428 0.522 0 631 0 771 
(0.352) (0.800) (0.740) (0.674) (0.603) (0.527) (0.447) (0.*360) (0.*260) 

6.3 0.812 0.805 0.801 0.801 0.805 0.814 0.831 0.360 0 907 

( ^n^ )( ^no2 )( ^ 264)<0,238)(0,213 H0. 188)(0.164)(0. 139)(0:il0) 

rJ'?EwX'!J2w°-!f 3 0,235 °' 316 0,405 0,506 0.622 0.769 
(0.676) (0.722) (0.730) (0.708) (0.660) (0.593) (0.509) (0.411) (0.293) 

7,2 ,° n 'V n L,° n 'V 9 0,927 0,915 °' 905 °' 89C 0.896 0.905 0.932 
( S , J21 )( n , n2? )< 2 ,2 f 0)<0,204)(0 ' 189)(0, 167 ^0 ,14 0H0. 114) (0.093) 
,°/? 2 „* ,061 0, 116 0- 185 0- 2S9 0-367 0.479 -. 607 0. 764 
(0.418) (0.577) (0.691) (0.750) (0.755) (0.7: 2) (0.631) (0.515) (0.2 61) 

8,1 /J'!!!!,, 0,993 0,988 0,982 °- 974 0.966 0.958 0.955 0.964 
<0,039)(0,058) <0.0 7 9)(0.099)(0.114)(0.119)(0. 110) (0.088) (0.068) 
0.006 0.025 0.063 0.124 0.208 0.314 0.440 0 585 0 758 

(0. 159) (0. 367) (0. 597) (0. 791) (0. 914) (0. 949) (0. 896) (0 758) (0.528) 
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Table of the Raw Agreement Index and Its 

S.E.*SQRT(M) , the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M • Number of subjects 
Number of items N » 9 
Mastery score C ■ 6 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



0.9 1.000 0.999 0.998 0.995 0.991 0.985 0.977 0.969 0.970 
(0.007) (0.014) (0.025) (0.041) (0.061) (0.0C") (0. 091) (0.082) (0.058) 
0.001 0.009 0.031 0.075 0.146 0.248 0.381 0.542 0.737 
(0.047) (0. 174) (0.380) (C.620) (0.831) (0.961) (0.975) (0.857) (0.598) 

1.8 0.990 0.985 0.978 0.968 0.957 0.945 0.934 0.929 0.940 
(0.0 70) (0.091) (0.112) (0.131) (0.143) (0.145) (0. 132) (0.105) (0.079) 

0.008 0.029 0.067 0.125 0.205 0.306 0.428 0.572 0.748 
(0.186) (0.356) (0.530) (0 . 6 7 1) (0 . 754) (0 . 7 66) (0 . 708) (0 . 587) (0.404) 

2.7 0.939 0.927 0.914 0.901 0.889 0.880 0.877 0.885 0.915 
(0.215) (0.2 23) (0.223) (0.215) t.0. 199) (0.176) (0. 148) (0.118) (0.095) 
0.023 0.060 0.il2 0.179 0.260 0.356 0.467 0.596 0.756 
(0.405) (0.542) (0.641) (0 . 6 93) (0 . 69 6) (0 . 6 54) (0.574) (0.462) (0.320) 

3.6 0.811 0.302 0.796 0.794 0.796 0.304 0.819 0.847 0.396 
(0.314) (0.290) (0.264) (0 . 23 3) (0 . 2 12) (0.136) (0.162) (0.1 37) (0.110) 
0.042 0.094 0.156 0.227 0.307 0.396 0.497 0.615 0.763 
(0.640) (0.688) (0.702) (0.684) (0.640) (0.5 74) (0.490) (0.392) (0.2 76) 

4.5 0.633 0.648 0.665 0.'86 0.711 0.740 0.776 0.822 0.886 
(0.3 39) (0.317) (0.295) (0.2 7 2) (0.2 48) (0.2 22) (0. 19 3) (0.161) (0. 122) 
0.059 0.122 0.1C9 0,261 0.339 0.423 0.517 0.627 0.768 
(0.324) (0.7 82) (0.729) (0.667) (0.59 9) (0.5 24) (0.44 3) (0.3 55) (0.256) 

5.4 0.534 0.568 0.603 (.639 0.677 0.718 0.764 0.818 0.887 
(0.45 5) (0.412) (0.371) (0.332) (0.2 93) (0.2 55) (0.216) (0.175) (0. 128) 

0.067 0.135 0.205 0.278 0.354 0.436 0.527 0.634 0.772 
(0.913) (0.826) (0.743) (0 . 664) ( 0 . 587) (0 . 5 10) (0.43 2) (P. 349) (0.255) 

6.3 0.624 0.644 0.667 0.692 0.721 0.753 0.791 0.837 0.899 
(0.335) (0.356) (0.326) (0 . 2 9 7) (0 . 2 67) (0 . 236) (0 . 20 3) (0. 1 68) (0 . 125) 

0.063 0.130 0.199 0.272 0.350 0.433 0.527 0.635 0.773 
(0.8 78) (0.820) (0.7 56) (0.689) (0.617) (0.542) (0 46 3) ( 0 . 3 7 7) (0 . 2 76) 

7.2 3.83', n 827 0.822 0.822 0.326 0.836 0.852 0.879 0.923 
(0.311) (0.J 36) (0.261) (0.236) (0.2 11) (0.187) (0.164) (0.141) (0.111) 

0.045 0.102 0.167 0.241 0.323 0.413 0.514 0.630 0.773 
(0.700) (0.7 56) (0. 7 71) (0.7 52) (0.707) (0.6 40) (0.55 7) (0.457) (0.330) 

3 1 0.976 0.967 0.957 0.947 0.933 0.932 0.931 0.937 0.957 
(0.144) (0.161) (0.171) (0.17 2) (0.163) (0.145) (0.12 3) (0.102) (0.083) 
0.019 0.056 0.111 0.184 0.272 0.373 0.488 0.617 0.771 
(0.339) (0.610) (0.7 78) (0.878) (0.909) (0.880) (0.798) (0.6 66) (0.473) 



For the mastery score ■ 4 enter N-xbar in the test nean column 
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RELIABILITY IN MASTERY TESTING 



Table of the Raw Agreement Index and its 

S.E.*SQRT(M), the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binoaial Model 
M - Number of subjects 
Number of items N - 9 
Mastery score C ■ 7 



Test KR21- 

ll 00 - 200 -300 .400 .500 .600 .700 .800 .900 

0,9 ,JS*!!2?w 1,000 1 * 000 °' 999 °' 997 0.99r~o"989"o"98o"~0~9 75" 
(0.001) (0.002) (0.005) (0.012) (0.024) (0.042) (0.063) (0.074) (0.056) 

0.000 0.003 0.012 0.038 0.091 0.179 0.309 0 483 0 704 
(0.010) (0.062) (0. 193) (0.405) (0. 659) (0.886) (1 .007) (0.956) (0.*688) 

1,8 mmL!*!^ 0,995 0,991 °' 985 °' 975 0.963 0.951 0.948 
( 2*2?i U n*^? ) (0 * 057) (0 ' 079) (0 ' l00 > < 0 - 112 ) (0-10^) (0.071) 
/2*22L,°* 010 0 ' 031 °' 071 °« 137 0.232 0.360 0,520 0 720 
(0.058) (0.165) (0.324) (0.506) (0. 666) (0. 764) (0. 769) (0. 669) (o!463) 

2.7 0.987 0.981 0.973 0.963 0.951 0.936 0.922 0 913 0 921 
(0.078X0.098) (0.1i9)(0.139) (0.152)(0.1.-)(6 45)(S J 6) (S.'oL) 
0.008 0.027 0.062 0.115 0.190 0.287 0.407 0 553 0 733 

(0. 175) (0. 313) (0. 468) (0. 596) (0. 676) (0. 694) (0. 644) (o!530) (0.351) 

3 * 6 ,°.*?Ji5w!! ,92 !w 0,914 °- 90 ° °' 886 °' 875 0.868 0 .873 0.901 
( S*o5 J )( ?'"8)(0.221)(0.217) (0.205) (0.183) (0.153)(0.120)(0 096) 
0.021 0.054 0.102 0.165 0.244 0.338 0.449 0 581 n 7A5 

(0.363)(0.490)( 0 .589)(0.648)(0.661)(0.628)(2:554)(S.l4j)(S:3S4) 
4.5 0.824 0.814 0.805 0.799 0.797 0.800 0.812 0 837 n ar? 

( ^M^ ( Si 8 T < ^?^f )( ^^ )( ^ 2ll)(o • l84)( ° : '»>< S: "«>(S.1?) 

0.038 0.087 0.145 0.214 0.293 0.382 0.484 0 604 0 755 
(0.585) (0.644) (0.671) (0.665) (0.630) (0.570) (0.488) (0.388) (0.*272) 

5.4 0.651 0.660 0.673 0.690 0.711 0.737 0.771 0 817 n A82 

/S'ScSw 0,116 °* 182 °' 254 °' 331 0.416 0.511 0.622 0.763 
(0.787) (0.761) (0.720) (0.666) (0.602) (0.529) (0.449) (0.360) (0.260) 

6.3 0.535 0.569 0.603 0.639 0.677 0.718 0.765 0 819 0 ftno 

(0 o'otV WW WlV WAV ( °- 300) (0 ' 263) (C226) (S:Io5)(2:n6) 
°*!; w °' 135 °.205 0.277 0.354 0.436 0.528 0.634 0 770 
(0.914) (0.831) (0.752X0. 675) (0.599) (0. 523) (0. 446) (0.364) (6.268) 

7,2 /2* 63 is , 0,656 °- 680 0- 706 0« 735 0.768 0.806 0.852 0.911 
(0 *S lVWll H ° n 'l^ H0 ' 313) (0.281)(0.249)(0.216)(0. 179)(S 3 ) 

^•s^w 0 /;?^, 0,204 0,278 °- 356 o - 44 ° 0.533 o.eio o.??* 

(0.911) (0.852) (0.788) (0.722) (0.652) (0.579) (0.502) (0.415) (0.308) 

8,1 0,879 °* 873 °' 871 0- 873 0.880 0.893 0.915 0.948 

(0,28 S )( ° ,2 SI ) (;- 2, * 4 >(0- 2 20) (0. 197) (0. 17C)(0.157)(C 137)(0 106) 
0.043 0.100 0.168 0.245 0.329 0.422 0 524 0 638 n 777 

i°lllH i2lff21£°l 87 2 > (0 ' 874) ( ° ,842) ( °" 782) (0.696) (0."583) (0.427) 

For the mastery score - 3 encer N-xbar "in"t he"t es t"meIn"column" 
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Table of the Raw Agreement Index and its 

-.E.*SQRT(M) , the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M « Number of subjects 
Number of items N - 9 
Mastery score C - 8 

Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 



0.9 1.000 1.000 1.000 1.000 0.999 0.998 0.996 0. J89 0.980 
(0.000) (0.000) (0.001) (0.002) (0.007) (0.0i. ) (0.034) (0.057) (0.058) 

0.000 0.000 0.004 0.015 0.045 0.109 0.222 0.398 0.648 
(0.001) (0.015) (0.071) (0.203) (0.422) (0. 697) (0.942) (1.029) (0.812) 

1.8 1.000 1.000 0.999 0.998 0.996 0.992 0.984 0.970 0.957 
(0.002) (0.004) (0.008) (0.016) (0.029) (0.049) (0.075) (0.094) (0.074) 
0.000 0.002 0.010 0.031 0.074 0.150 0.270 0.440 0.670 
(0.011) (0.051) (0.143) (0.292) (0.479) (0.659) (0.770) (0.749) (0.548) 

2.7 0.998 0.997 0.995 0.991 0.984 0.974 0.960 0.942 0.932 
(0.015) (0.024) (0.037) (0.054) (0.076) (0.101^(0.120) (0.118) (0.081) 

0.002 0.008 0.025 0.057 0.113 0.199 0.320 0.481 0.690 
(0.043) (0.127) (0.251) (0.402) (0.549) (0.656) (0.685) (0.611) (0.427) 

3.6 0.989 0.984 0.977 0.967 0.955 0.939 0.921 0.905 0.908 
(0.065) (0.084) (0.105) (0. 126) (0. 145) (0 . 157 ) (0 . 153) (0.127) (0.085) 
0.006 0.021 0.049 0.094 0.161 0.252 0.370 0.519 0.708 
(0.135) (0.250) (0.381) (0.507) (0. 601) (0. 642) (0.616) (0.517) (0.354) 

4.5 0.952 0.941 0.928 0.913 0.897 0.881 0.868 0.865 0.888 
(0.175) (0.191) (0.203) (0 . 20S) (0 . 205) (0. 189) (0.161) (0.124) (0.096) 

0.016 0.043 0.084 0.141 0.214 0.307 0.419 0.554 0.725 
(0.288) (0.407) (0.512) (0.588) (0.6 24) (0. 613) (0.553) (0.448) (0.307) 

5.4 0.855 0.842 0.829 0.818 0.809 0.806 0.810 0.829 0.876 
(0.297) (0.285^ !0. 267) (0.244) (0.216) (0. 186) (0. 156) (0. 132) (0. 116) 

0.032 0.074 0.127 0.192 0.269 0.358 0.462 0.535 0.740 
(0.49 7) (0.574) (0.622) (0.639) (0.623) (0.575) (0.500) (0.400) (0.280) 

6.3 0.634 0.686 0.692 0.701 0.716 0.737 0.767 0.810 0.876 
(0.305) (0.281) (0.259) (0.239) (0.222) (0.206) (0.189) (0.169) (0.139) 

0.050 0.107 0.170 0.240 0.318 0.403 0.499 0.611 0.753 
(0.725) (0.726) (0.705) (0.667) (0,614) (0.546) (0.467) (0.377) (C.274) 

7.2 0.539 0.570 0.603 0.639 0.677 0,719 0.767 0.823 0.894 
(0.432) (0.404) (0.375) (0.346) (0.316) (0.233) (0.248) (0.207) (0. 153) 
0.066 0.134 0.204 0.277 0.354 0.436 0.527 0.632 0.764 
(0.917) (0.845) (0.7 73) (0 . 70 1 ) (0 . 623) (0 . 555 ) (0. 478) (0.595) (0.297) 

8.1 0.671 0.694 0.718 0.744 0.773 0.305 0.841 0.883 0.934 
(0.442) (0.407) (0.3 74) (0.342) (0.310) (0.277) (0.241) (0.199) (0.140) 

0.069 0.140 0.213 0.289 0.368 0.452 0.544 0.647 0.773 
(0.982) (0.935) (0.880) (0.821) (0. 757) (0.686) (0.607) (0.513) (0, 391) 

For the mastery score - 2 enter N-xbar in the test mean column 
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RELIABILITY IN MASTERY TESTING 



Table of the Raw Agreement Index and its 

S.L,*sqRT(M), the Kappa Index and its 
".E.*SQUT(M) in the Beta-binomial Model 
M ■ Number of subjects 
Number of items N « 9 
Mastery score C ■ 9 



Test KR21- 

llll -1°° -200 .300 .400 .500 .600 .700 .300 .900 

0.9 1.C00 1.000 1.000 1.000 1.000 1.000 0.999 0 . 996~~o798 7~ 
(0.000) (0.000) (0.000) (0.000) (0.001) (0.003) (0.011) (0.032) (0.060) 

0.000 0.000 0.001 0.004 0.015 0.045 0.117 0.263 0.530 
(0.000) (0.002) (0.015) (0.060) (0. 172) (0.380) (0 . 6 75) (0 . 962 ) (0 . 972) 

1.8 1.000 1.000 1.000 1.000 J. 999 0.998 0.995 0.987 0.969 

(0.000) (0.000) (0.001) (0.002) (0.005) (0. 013) (0.031) (0. 063) (0.085) 

fX'JJ?w!"X!!, °*° 02 °*° 08 0,026 °' 067 0. 151 0.304 1 
(0.001) (0.003) (0.035) (0 . 100) (0. 222) (0.401) (0.605) (0. 749) (0 . 675) 

2.7 1.000 1.000 0.999 0.999 0.997 0.994 0.987 0.971 0 946 
(0. 001) (0. 003) (C. 005) (0.010) (0.019) (0.036) (0 . 063) (0 . 097 ) (0 .' 099) 

0.000 0.001 0.006 0.017 0.044 0.097 0.192 0.343 C 590 
(0.006) (0. 026) (0.075) (0. 164) (0.295) (0.452) (0. 595) (0. 653) (0.* 535) 

3.6 0.999 0.998 0.9.7 0.994 0.990 0.982 0.968 0.946 0 920 

,5*5?}w 0,004 °* 013 °' 033 °' 071 0.135 0.239 0.394 0.619 
(0.024) (0.067) (0. 142) (0. 249) (0. 379) (0.505) (0. 590) (0 . 585) (0 . 44 6) 

4,5 r°/2?Jw?/n?L 0 * 987 0,981 °' 971 °- 956 °'936 0.910 0.893 
( °- 0 ^ )(0 -° 4 3) (0.064) (0.085) (0. 109) (0. 134) (0. 153) (0. 147) (0.095) 

rn'^w 0 /?, 12 ^, 0,0 " 0,059 °' 108 °' 183 °' 291 0.441 0. 646 
(0.072) (0. 143) (0.240) (0.352) (0.462) (0.547) (0. 578) (0. 52£») (0. 383) 

5.4 0.974 0.966 0.956 0.944 0.927 0.903 0.385 0.365 0.870 
( °'J 08)(0 -^«) (0.148) (0. 167H0. 183) (0. 139) (0. 179) (0. 142) (0.095) 
,2*?™w 0,026 °" 054 fj ' 096 °' 157 0.239 0.348 0.489 0.673 
(0.170) (0.264) (0. 365) (0.461) (0.534) (0.571) (0.555) (0.477) (0.339) 

6,3 fS'^wS'^rwS-S^w 0,864 0,847 °' 831 °' 819 0.821 0.856 
(0, " 7)(0 '22 6 > (0.249) (0.244) (0.230) (0.204) (0. 167) (0. 127) (0. 120) 

rn'^Swn* 05 ^, 0, 092 0, 147 °' 216 °- 300 - °- 407 0.535 0. 698 
(0.339) (0.430) (0.508) (0.563) (0 . 588) (0 . 5 78) (0 . 527) (0 . 435) (0.311) 

7 * 2 ,n'J 5 ;!wM 47 0, 739 °' 735 °' 735 °- 742 0. 760 0. 795 0.862 

( ^n!^ ( ^nar 2)( ^ 2C ^ (0, 233)(0,206)(0 • 134)(0 • 174 >( 0 'l 74 ) (0.167) 
, n rlL, 0,086 °' 143 °' 208 °- 2 83 0. 369 0.467 0.580 0.722 
(0.539) (0.635) (0.653) (0.656) (0.631) (0.532) (0.509) (0.417) (0.309) 

8.1 0.549 0.576 0.605 0.639 0.677 0.721 0.771 0 8"0 0 903 
( ^n^ )( n , ??, 3)(0, 331)(0,375)(0 - 363) (°- 344 >( 0 - 315 )(0. 270)(0:i97) 
,°/2^w 0,132 °* 203 °' 277 °' 354 0.436 0.524 0.623 0 744 

[°_'_l 2 _ ll[°_itlVJJ.:ttill°' 782> (0, 715> ( ° ,640) (0, 557) (0 ' 46 *> ^-367) 

For the mastery score - 1 enter H-xbar in" the"^ es t 'mean'column 
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Table of the Raw Agreement Index and its 

S.E.*SORT(M), the Kappa Index and its 
S.E.*SQRT(M) in the Beta-Linomial Model 
M ■ Number of subjects 
Number of items N "13 
Mastery score C - 5 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



1.0 0.994 0.982 0.983 0.976 0.967 0.958 0.951 0.950 0.961 
(0.056) (0.077) (0.099) (0. 1 1 7) (0. 128) (0 . 1 2 / ) (0. 1 1 4) (0 . 09 1 ) (0 . 07 1 ) 

0.008 0.031 0.073 0.138 0.223 0.323 0.451 0.591 0.758 
(0.199) (0.416) (0.634) (0.803) (0.897) (0. 9 10) (0 . 846) (0. 7 10) (0. 497) 

2.0 0.924 0.912 0.900 0.889 0.382 0.878 0.881 0.395 0.927 
(0.245) (0.243) (0.234) (0.218) (0.196) (0.171) (0. 145) (0.120) (0.096) 

0.029 0.073 0.132 0.203 0.286 0.381 0.489 0.612 0.764 
(0.499) (0.632) (0.714) (0. 745) (0.730) (0. 676) (0. 592) (0. 402) (0.341) 

3.0 0.743 0.744 0.749 0.757 0.770 0.788 0.813 0.849 0.902 
(0.324) (0.298) (0.273) (0. 249) (0.225) (0. 201 ) (0. 1 75) (0. 148) (0. 113) 

0.052 0.112 0.178 0.250 0.329 0.416 0.513 0.625 0.767 
(0.756) (0.759) (0. 736) (0. 693) (0. 633) (0.562) (0. 480) (0. 388) (0. 278) 

4.0 0.563 0.592 0.623 0.655 0.689 0.727 0.770 0.321 0.837 
(0.421) (0.384) (0.349) (0.313) (0.278) (0.243) (0.206) (0. 16 ')(0. 122) 

0.066 0.133 0.202 0.274 0.350 0.432 0.523 0.630 0.768 
(0.389) (0.811) (0.735) (0.660) (0.585) (0.503) (0.430) (0.346) (0.250) 

5.0 0.553 0.537 0.617 0.649 0.684 0.722 0.765 0.816 0.884 
(0.413) (0.378) (0.344) (0. 311) (0.2 77) (0.242) (0. 206) (0. 167) (0. 121) 

0.065 0.132 0.201 0.272 0.343 0.431 0.522 0.629 0.767 
(0.332) (0.806) (0.730) (0.655) (0.580) (0.503) (0.424) (0.340) (0.245) 

CO 0. 722 0. 724 0./30 0. 739 0.753 0.772 0.798 0.835 0.892 
(0.320) (0.295) (0.271) (0. 243) (0. 224) (0.201) (0. 175) (0. 147) (0. 113) 

0.052 0.111 0.176 0.248 0.326 0.412 0.509 0.621 0.764 
(0.747) (0.744) (0.713) (0.673) (0.614) (0. 54 1 ) (0 . 459) (0.367) (0. 260) 

7.0 0.897 0.834 0.872 0.862 0.855 0.852 0.857 0.873 0.910 
(0.272) (0. 264) (0.249) (0. 229) (0.206) (0. 1 79) (0. 151) (0. 124) (0.099) 

0.032 0.076 0.134 0.204 0.285 0.373 0.483 0.606 0.759 
(0.515) (0.620) (0.681) (0.698) (0.676) (0.619) (0.535) (0.429) (0.299) 

3.0 0.981 0.^74 0.964 0.953 0.941 0.929 0.920 0.9)1 0.936 
(0. 109) (0. 130) (0. 149) (0, 161) (0.164) (0. 156) (0. 136) (0. 108) (0.082) 
0.012 0.039 0.083 0.146 0.228 0.329 0.447 0.584 0.751 

(0.256) (0.432) (0.590) (0.701) (0.75 1)^0. 737) (0.665) (0.544) (0.375) 

9.0 0.999 0.998 0.996 0.993 0.987 0.980 0.971 0.964 0.967 
(0.011) (0.021) (0.036) (0.055) (0.075) (0.092) ( 0. 098) (0. 085) (0.060) 

0.002 0.012 0.038 0.033 0.164 0.268 0.399 0.555 0.742 
(0.068) (0.218) (0.436) (0 . 665 ) (0 . 04 7) (0.941) (0. 926) (0. 799) (0.555) 

For the mastery score - 6 enter N-xbar in the test mean column 
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RELIABILITY IN MASTERY TESTING 



Table of the Raw Agreement Index and its 

S.E.*SQRT(M) , th«. Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M - Number of subjects 
Number of items N »10 
Mastery score C » 6 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



1.0 0.999 0.998 0.996 0.993 0.987 0.980 0.971 0.964 0.967 
(0.011) (0.021) (0. 036) (0.055) (0.075) (0.092) (0.098) (0.085) (0.060) 

0.002 0.012 0.038 0.038 0.164 0.268 0.399 0.555 0.742 
(0.063) (0.218) (0.436) (0.665) (0.347) (0.941) (0 . 926) (0 . 7 99) (0.555) 

2.0 0.931 0.974 0.964 0.953 0.941 0.929 0.920 0.919 0.936 
(0.109) (0.130) (0.149) (0.161) (0.164) (0.156) (0 . 136) (0 . 1 08) (0.082) 

0.012 0.039 0.033 0.146 0.228 0.329 0.447 0.584 0.751 
(0.256) (0.432) (0.590) (0.701) (0. 751) (0.737) (0.66 5) (0. 544) (0.375) 

3.0 0.897 0.C84 0.C72 0.362 0.355 0.35? - 857 0.873 0.910 
(0.272) (0. 264) (0.249) (0.229) (0.206) (0. 179) .51) (0. 124) (0.099) 

0.032 0.076 0.134 0.204 0.285 0.378 0.483 0.606 0.759 
(0.515) (0.620) (0.681) (0.698) (0.6 76) (0.619) (0.535) (0.42 9) (0.299) 

4.0 0.722 0.724 0.730 0.739 0.753 0.772 0.798 0.835 0.892 
(0.320) (0.295) (0. 271) (0. 248) (0. 224) (0 . 201 ) (0 . 1 7 5) (0 . 1 47 ) (0 . 1 1 3) 

0.052 0.111 0.176 0.248 0.326 0.412 0.509 0.621 0.764 
(0. 747) (0. 744) (0.718) (0. 673) (0. 614) (0.541) (0.45 9) (0. 367) (0.260) 

5.0 0.558 0.587 0.617 0.649 0.684 0.722 0.765 0.516 0.884 
(0.413) (0. 373) (0.344) (0 . 3 1 1 ) (0 . 2 77) (0. 242) (0. 206) (0. 167) (0. 121) 
0.065 0.132 0.201 0.272 0.348 0.431 0.522 0.629 0.767 
(0.832) (0.306) (0.730) (0.6 55) (0.580) (0. 503) (0. 424) (0. 340) (0.245) 

6.0 0.563 0.592 0.623 0.655 0.689 0.727 0.770 0.821 0.887 
C.421) (0.334) (0.349) (0.313) (0. 2 78) (0.243) (0. 206) (0. 16 7) (0. 122) 

0.06C 0.133 0.202 0.274 0.350 0.432 0.523 0.630 0.768 
(0.G89) (0.811) (0.735) (0.660) (0. 585) (0.508) (0. 430) (0. 346) (0.250) 

7.0 0.743 0.744 0.749 0.757 0.770 0.788 0.813 0.349 0.902 
(0.324) (0.298) (0.273) (0.2*9) (0.225) (0.2 0') (0 . 1 7 5) ( 0 . 1 48) (0 . 1 1 3) 

0.052 0. 112 0.J.73 0.250 0.329 0.416 0.513 0.625 0.767 
(0.756) (0.759) (0. 736) (0.6 93) (0. 633) (0.562) (0.480) (0. 388) (0.278) 

3.0 0.924 0.912 0.900 0.389 0.882 0.378 0.831 0.395 0.927 
(0.245) (0.243) (0.234) (0 . 2 1 8) (0 . 1 96) (0 . 1 7 1 ) (0 . 14 5) (0. 120) (0.096) 

0.029 0.073 0. 132 0.203 0.286 0.38.'. 0.489 0.612 0. 764 
(0.499) (0.632) (0. 714) (0.745) (0. 730) (0. 676) (0. 592) (0. 482) (0.341) 

9.0 0.994 0.939 0.933 0.976 0.967 0.953 0.951 0.950 0.961 
(0.056) (0.077) (0.099) (0. 117) (0. 128) (0. 127) (0 . 1 1 4) (0 . 09 1 ) (0.071) 

0.008 0.031 0.073 0.138 0.223 0.328 0.451 0.591 0,758 
(0, 199) (0.416) (0.634) (0.803) (0.897) (0, 910) (0.846) (0.710) (0.497) 



For the mastery score ■» 5 enter N-xbar in the test mean colunn 
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Table of the Raw Agreement Index and its 

S.E.*SQRT(M), the Kappa Index and its 
S.E.*SQRT(M) in r.he Beta-binomial Model 
M ■ Number of subjects 
Number of items N "10 
Mastery score C - 7 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



1.0 1.000 1.000 0.999 0.998 0.996 0.992 0.985 0.976 0.972 
(0.002) (0.004) (0.010) (0.019) (0.034) (0 . 05s ) (0. 0 73) (0. 078) (0.056) 

0.000 0.004 0.017 0.050 0.110 0.206 0.339 0.508 0.717 
(0.017) (0.091) (0.251) (0.480) (0.722) (0.908) (0.9 77) (0.891) (0.627) 

2.0 0.997 0.994 0.990 0.984 0.976 0.964 0.952 0.941 0.944 
(0.030) (0.044) (0.063) (0 . 085) (0. 105) (0 . 12 1 ) (0 . 1 23) (0 . 105) (0 . 073) 
0.004 0.016 0.044 0.092 0.166 0.265 0.391 0.544 0.731 
(0.098) (0.235) (0.410) (0 . 58 1) (0 . 709) (0 . 764) (0 . 7 32) (0.616) (0.421) 

3.0 0.972 0.962 0.951 0.938 0.925 0.911 0.901 0.899 0.918 
(0.136) (0.156) (0.171) (0.180) (0.181) (0. 170) (0. 14 7) (0. 115)(0.J86) 

0.014 0.041 0.084 0.145 0.225 0.322 0.438 0.575 0.743 
(0.271) (0.422) (0.5 56) (0 . 650) (0. 689) (0 . 67 2) (0 . 60 1) (0.485) (0.329) 

4.0 0.833 0.870 0.358 0.847 0.839 0.836 0.841 0.857 0.897 
(0.231) (0.271) (0.256) (0 . 235) (0 . 2 10) (0. 132) (0. 153) (0. 126) (0. 101) 

0.032 0.075 0.131 0.199 0.279 0.371 0.476 0.599 0.753 
(0.506) (0.599) (0.654) (0. 670) (0.648) (0 . 593) (0. 5 10) (0.405) (0.279) 

5.0 0.714 0.716 0.720 0.729 0.742 0.761 0.788 0.826 0.885 
(0.316) (0.291) (0.267) (0. 244) (0. 222) (0. 199) (0.176) (0. 149) (0.115) 

0.051 0.109 0.173 0.244 0.322 0.408 0.504 0.616 0.760 
(0.730) (0.729) (0.705) (0.663) (0.605) (0. 534) (0.452) (0.359) (0.254) 

6.0 0.555 0.5S3 0.613 0.645 0.680 0.713 0.762 0.814 0.383 
(0.405) (0.373) (0.341) (0 . 3 10) (0 . 2 78) (0. 244) (0.209) (0. 170) (0. 125) 
0.065 0.131 0.200 0.271 0.347 0.430 0.521 0.628 0.765 
(0.378) (0.804) (0.7 30) (0.656) (0.582) (0 . 506) (0 . 42 7) (0.343) (0.248) 

7.0 0.573 0.602 0.632 0.664 0.698 0.736 0.778 0.828 0.893 
(0.431) (C. 392) (0.355) (9. 319) (0.284) (0 . 2*8) (0. 2 1 1) (0. 172) (0. 125) 

0.066 0.134 0.203 0.276 0.352 0.435 0.526 0.C32 0.768 
(0.900) (0.323) (0.747) (0.672) (0.598) (0. 523) (0.446) (0. 362) (0.265) 

3.0 0.733 0.731 0.783 0.789 0.799 0.815 0.837 0.869 0.917 
(0.323) (0.296) (0.2 71) (0. 246) (0.222) (0. 198) (0.174) (0. 148) (C. 114) 

0.051 0.111 0.173 0.252 0.332 0.420 0.517 0.629 0.770 
(0.758) (0.779) (0.7 68) (0. 732) (0.677) (0. 609) (0 . 527) (0.433) (0.315) 

9.0 0.965 0.955 0.944 0.935 0.92C 0.922 0.922 0.931 0.953 
(0. 175) (0.187) (0.1 91) (0. 135) (0.171) (0. 151) (0.128) (0. 107) (0. 087) 

0.023 0.063 0.121 0.195 0.232 0.382 0.493 0.613 0.768 
(0.441) (0,643) (0.7 86) (0.862) (0.875) (0. 336) (0 . 753) (0.628) (0.449) 

For the mastery score - 4 enter N-xbar in the test mean column 
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Table of the Raw Agreement Index and its 

S.E.*SQRT(M) , the Kappa Index and its 
S.E.*SQRT(M) in the Beta-binomial Model 
M - Number of subjects 
Number of items N "10 
Mastery score C ■ 8 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 

1.0 1.000 1.000 1.000 1.000 0.999 0.997 0.993 0.985 0.978 
(0.000) (0.001) (0.002) (0.005) (0.012) (0.025) (0.046) (0.065) (0.056) 

0.000 0.001 0.006 0.024 0.065 0.143 0.268 0.446 0.681 
(0.003) (0.029) (0. 115) (0.289) (0.535) (0.794) (0.9 73) (0.974) (0.719) 

2.0 1.000 0.999 0.998 0.996 0.992 0.985 0.975 0.961 0.952 
(0.005) (0.010) (0.018) (0.031) (0.050) (0.073) (0.094) (0.100) (0.071) 

0.001 0.005 0.019 0.049 0.105 O.liH 0.321 0.488 0.700 
(0.026) (0.096) (0.226) (0.402) (0.586) (0.725) (0.771) (0.693) (0.482) 

3.0 0.995 0.991 0.987 0.979 0.969 0.956 0.940 0.926 0.927 
(0.039) (0.055) (0.074) (0.096) (0.118) (0.134) (0.137) (0. 117) (0.079) 

0.C04 0.017 0.043 0.087 0.156 0.250 0.373 0.526 0.717 
(0.102) (0.221) (0.370) (0 . 519) (0. 634) (0.687) (0.661) (0 . 554) (0. 374) 

4.0 0.968 0.958 0.947 0.933 0.91S 0.903 0.890 0.885 0.904 
(0.141) (0.160) (0.176) (0.186) (0.188) (0.178) (0. 155) (0. 120) (0.089) 

0.014 0.039 0.079 0.136 0.212 0.307 0.422 0.560 0.731 
(0.255) (0.389) (0 . 5 12 ) (0 . 604) (0.648) (0.638) (0.5 74) (0.462) (0.311) 

5.0 0.833 0.869 0.356 0.844 0.834 0.829 0.831 0.846 0.886 
(0.278) (0.271) (0.258) (0 . 238) (0 . 2 14) (0. 185) v 0. 1 55) (0. 127) (0. 104) 

0.029 0.071 0.124 0.189 0.267 0.358 0.464 0.588 0.744 
(0.472) (0.564) (0.623) (0.646) (0.632) (0.583) (0.503) (0. 399) (0.274) 

6.0 0.718 0.717 0.720 0.726 0.737 0.754 0.780 0.818 0.879 
(0.312) (0.286) (0.2C2) (0.239) (0.217) (0.197) (0.175) (0.152) (0.121) 

0.049 0.104 0.1«7 0.237 0.315 0.400 0,497 0.610 0.754 
(0.701) (0.709) (0.693) (0.657) (0.604) (0.536) (0.454) (0. 362) (0. 257) 

7.0 0.554 0.531 0.611 n 643 0.677 0.716 0.760 0.814 0.834 
(0.394) (0.367) (0 . 340) (0 . 3 1 2) (0 . 282) (0 . 2 5 1) (0.218) (0. 180) (0.134) 

0.064 0.130 0.199 0.271 0.347 0.429 0.520 0.627 0.763 
(0.375) (0.G06) (0. 736) (0.664) (0 . 5 91 ) (0 . 5 1 7) (0.439) (0. 356) (0.261) 

8.0 0.591 0.619 0.649 0.680 0.714 0.751 0.793 0.842 0.905 
(0.445) (0.405) (0.368) (0.331) (0.295) (0.259) (0.223) (0. 183) (0. 133) 

0.067 0.130 0.206 0.280 0.357 0.439 0.530 0.636 0.769 
(0.921) (0.84 7) (0.7 74) (0.702) (0.630) (0.557) (0.482) (0.399) (0.296) 

9.0 0,860 0.853 0.850 0.851 0.856 0.866 0.882 0.907 0 944 
(0.303) (0.279) (0.254) (0 . 230) (0 . 207) (0 . 1 86) ( 0 . 1 66) (0.144) (0.110) 

0.048 0.107 0.175 0.251 0.335 0.425 0.525 0.637 0.773 
(0.749) (0.827) (0.855) (0.844) (0.805) (0 . 742) ( 0 . 6 60) (0.553) (0. 409) 

For the mastery score - 3 enter N-xbar in the test mean column"" 
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Table of the Raw Agreement Index and its 

S.E.*SQRT(M), the Kappa Index and its 
S.E,*SQRT(M) in the Beta-binomial Model 
M • Number of subjects 
Number of items N -10 
Mastery score C - 9 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 

1.0 1.000 1.000 1.000 1.000 1.000 0.999 0.997 0.993 0.983 
(0.000) (0.000) (0.000) (0.001) (0.003) (0.009) (0.022) (P. 0*6) (0.058) 
0.000 0.000 0.002 0.009 0.031 0.083 0.186 0.359 0.620 
(0.000) (0.006) (0.039) (0.132) (0. 317 ) (0 . 586) (0 . 867) ( 1 . 01 7) (0.840) 

2.0 1.000 1.000 1.000 0.999 0.998 0.995 0.989 0.977 0.962 
(0.001) (0,001) (0.003) (0.007) (0. 016) (0 .031) (0.056) (0. 083) (0. 076) 

0.000 0.001 0.006 0.020 0.054 0.120 0.233 0.404 0.646 
(0.004) (0.027) (0.090) (0.212) (0. 389 ) (0 . 587) (0 . 737) (0 . 75 7) (0.570) 

3.0 0.999 0.999 0.998 0.995 0.991 0.984 0.971 0.053 i 937 
(0.006; (0.011) (0.019) (0. 032) (0. 050) (0.075) (0.101) (0.113) (0.083) 

0.001 0.005 0.016 0.040 0.088 0.166 0.234 0.449 0.669 
(0.024) (0.0 79) (0.178) (0.3 19) (0.478) (0.614) (0,6 77) (0.627) (0.443) 

4.0 0.995 0.992 0.987 0.980 0.970 0.956 0.938 0.918 0.912 
(0.v>36) (0.050) (0.068) (0.090) (0. 113) (0.133) (0.143) (0.12!*) (0.084) 

0.004 0.014 0.035 0.073 0.133 0.220 0.338 0.492 0.691 
(0.084) (0.173) (0.302) (0.437) (0. 554 ) (0 . 625) (0 . 623) (0.535) (0.365) 

5.0 0.972 0.963 0.953 0.939 0.923 0.906 0.888 0.877 0.890 
(0.122) (0.142) (0.161) (0.176) (0.185) (0.182) (0.164) (0. 127) (0. 090) 
0.011 0.032 0.066 0.117 0.187 0.278 0.392 0.532 0.710 
(0.208) (0.325) (0.442) (0.540) (0.601) (0.612) (0.566) (0.463) (0.313) 

6.0 0.398 0.834 0.870 0.855 0.842 0.831 0.827 0.836 0.374 
(0.259) (0.260) (0.254) (0.241) (0.219) (0.191) (0.153) (0.126) (0.108) 
0.025 0.061 0.109 0.170 0.245 0.335 0.442 0.568 0.728 
(0.405)(0.501) (0.572) (0.612) (0.615) (0.581) (0.511) (0.408) (0.282) 

7.0 0.739 0.733 0.731 0.732 0.739 0.751 0.773 0.809 0.872 
(0.313) (0.286) (0.259) (0.234) (0.211) (0.191) (0.173) (0.157) (0.133) 
0.044 0.096 0.156 u.225 0.301 0.387 0.485 0.599 0.743 
(0.643) (0.6 73) (0. 675) (0 .653) (0.610) (0.548) (0. 4 70) (0. 377) (0.270) 

3.0 0.555 0.531 0.609 0.641 0.675 0.714 0.760 0.815 0.888 
(0.377) (0.359) (0. 339) (0 .31 7) (0.2 94) (0.268) (0.238) (0. 202) (0. 152) 

0.063 0.129 0.198 0.269 0.346 0.428 0.519 0.624 0.757 
(0.374) (0.315) (0.752) (0.636) (0.6i7) (0.545) (0.469) (0.336) (0.288) 

9.0 0.637 0.664 0.692 0.722 0.755 0.790 0.829 0.G74 0.928 
(0.470) (0.430) (0.393) (0.357) (0.322) (0 . 286) (0. 248) (0. 204) (0. 143) 
0.070 0.141 0.214 0.289 0.367 0.450 0.540 0.642 0.768 
(0.980) (0.919) (0.357) (0.79 3) (0.727) (0 . 65 7) (0 . 5 8 1) (0 . 49 1) (0 . 3 75) 

For the mastery score - 2 enter N-xbar in the test mean column 
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Table cf th~ .law Agreement Index and its 

S.E.*SQRT(M) , the Kappu Index and its 
S.E.*3QRT(M) in the Beta-binomial Model 
M » Number of subjects 
Number of items N "10 
Mastery score C «10 



Test KR." '.■ 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 

1.0 1.000 1.J00 1.000 1.000 1.000 1.C00 0.999 0.997 0.989 
(0.000) (0.000) (0.000) (0.000) ^0.000) (0.002) (0.006) (0.023) (0.055) 

0.000 0.000 0.000 0.002 0.009 0.032 0.093 0.229 0.497 
(0.000) (0.001) (0.007) (0.036) (0.118) (0 . 294) (0 . 579) (0. 901) (0 . 981) 

2.0 1.000 1.C00 1.000 I. Of x.000 0.999 0.997 0.991 0.973 
(0.000) (0.000) (0.000) (0.001) (0.003) (0.007) (0.020) (0.049) (0.082) 

0.000 0.000 0.001 0.005 0.017 0.050 0.124 0.269 0.530 
(0.000) (0.004) (0.020) (0.066) (0.164) (0 . 329) (0 . 541) (0. 721) (0. 687) 

3.0 1.000 1.000 1.000 0.999 0.999 0.997 0.991 0.978 0.952 
(0.000) (0.001) (0.002) (0.005) (0.011) (0 . 023) (0 . 046) (0.082) (0. 100) 
0.000 0.001 O.003 0.011 0.032 0.075 0.162 n 314 0.563 
(0.003) (0.014) (0.047) (0.117) (0.234)(0.3S0) (0.551) (0.642) (0.548) 

4.0 1.000 0.999 0.998 0.9*7 0.994 0.989 0.9/7 0.957 0.927 
(0.004) (0.007) (0.012) (0.020) (0.034) (0.056) (0.087) (0.118) (0.106) 

,2'!!?2w 0 - 003 , 0 - 009 0 ' 023 °' 054 0. 111 0.209 0.363 0.595 
(0.013) (0.042) (0.100) (0.194) (0.319) (0 . 456) (0 . 565) (0.58"- (0.458) 

5.0 0.997 0.995 0..992 0.988 0.930 0.969 0.T50 0.923 0.899 
(0.021) (0.030) (0.042) (0.060) (0.082) (0 . 109) (0 . 136) (0. 145) (0.101) 
0.002 0.003 0.020 0.045 0.088 0.157 0.262 0.413 0.626 
(0.046) (0.102) (0. 187) (0.295) (0.412) (0. 513) (0.567) (0. 535) (0 . 393* 

6.0 0.984 0.978 0.970 0.960 C.946 0.927 0.903 0.879 0.872 
^J™^ 0 - 093 )* 0 - 1 ' 4 ) (0.136) (0. 157) (0. 173) (0.175) (0. 149) (0.093) 

0.006 0.019 0.043 0.080 0.135 0.214 0.322 0.465 0.656 
f 0. 124) (0.209) (0.303) (0.410) (0.498) (0.553) (0.554) (0.486) (0.344) 

? "° /n*?^x, 0,922 x °- 908 °' 891 °' 8 ' 3 °' 353 0.836 0.829 0.854 
(0. 197) (0.212) (0.223) (0.228) (0.224) (0. 208) (0. 175) (0. 130) (0 . 109) 

,2 ,0 ,i$w 0,042 0 ' 079 °' 129 °' 195 O- 230 0.386 0.517 0.685 
(0.277) (0.371) (0.457) (0.526) (0.566) (0.570) (0.529) (0 . 44 1 ) (0 . 3 12) 

8,0 ,°/o^w2* 783 v °' 771 °' 762 °« 756 °' 757 0-767 0.7,5 0.856 
(0. 317) (r. 297) (0.272) (0.243) (0.212) (0. 182) (0.162) (0. 157) (0.158) 
0.034 0.078 0.130 0.193 0.267 0.352 C.451 0.567 0.712 

(0. 524) (0. 582) (0. 619) (0 631) (0.617) (0.576) (0.508) (0.416) (0.304) 

**° ,°/li 6 $x , 0,585 J ' 610 °'° 9 °' 673 0- 71 * 0.762 0.821 0.896 
(0.333) (0.335) (0.337) (0.336) (0.331) (0.320) (0.299) (0.263) (0. 198) 

, n a 61 °' 195 °' 268 °' 345 °' 427 0.516 0.615 0.737 

(0.877; (0.850) (0,810) (0.758) (0.696) (0. 624) (0.544) (0. 454) (0. 354) 

For the mastery score - 1 enter N-xbar in the test mean'column 
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APPENDIX B 

A Cjaputer Program To Compute the Reliability Indices 
for Decision in Mastery Testing and Their Standard 
Errors of Estimate Based on the Beta-Binomial Model 

Disclaimer : The computer program hereafter listed has been written 
with care and tested extensively under a variety of conditions using 
tests with SO or fewer items. The author, however, makes no warranty 
as to ^ts accuracy and functioning, nor shall the fact of its dis- 
tribution i: ply such warranty. 
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G *********** ' **** * ****+ + * ' ************AkAAkAAkAA*Ak*AA*A\»»AkAAAAAAk*kA * * IQ 

C 20 

C A COMPUTER PROGRAM TO COMPUTE THE RELIABILITY INDICES 30 

C FOR DECISION IN MASTERY TESTING AND THEIR STANDARD 40 

C ERRORS OF ESTIMATE BASED ON THE BETA- BINOMIAL MODEL. 50 

C 60 

C INPUT DATA ARE: 70 

C FIRST CARD: TITLE CARD. ENTER ANYTHING YOU WANT. 80 

C 90 

C SECOND CARD: MUST CONTAIN THE FOLLOWING INFORMATION 100 

c 110 

C N NUMBER OF ITEMS 120 

C M NUMBER OF SUBJECTS OR EXAMINEES 130 

C K NUMBER OF CLASSIFICATION CATEGORIES 140 

C XMR. . .MEAN OT TEST SCORES 150 

C SZ STANDARD DEVIATION OF TEST SCORES 160 

C FORMAT FOR SECOND CARD IS (3I5.2F10.5) . 170 

C 180 

C THIRD CARD: MUST CONTAIN THE (K-l) CUTOFF SCORES. 190 

C FORMAT IS (1615) . 200 

C 210 

C ***AAAAAAAAA*AA * MAAA*AA***+AAA*mnAAkAAAi AAAAA**A M A * AA*AAAhkkAAAAAA**kA* 220 

C 230 

C REMARK: THIS PROGRAM IS SET UP FOR TESTS WITH 240 

C UP TO 60 ITEMS. FOR LONGER TESTS USE THE FOLLOWING 250 

C DIMENSION MODIFICATIONS IN SUBROUTINE KAPPA. 260 

C 270 

C LET N BE THE NUMBER OF TEST ITEMS. 280 

C THEN THE DIMENSION OF F(.), XA(.> XB(.) AND CF(.) IS N+l. 290 

C 300 

c 310 
C ALSO UP TO 17 CLASSIFICATION CATEGORIES CAN BE ACCOMMODATED. 320 

C FOR MORE CATEGORIES CHANGE L(17) TO L(K) IN THE MAIN 330 

C PROGRAM, K BEING THE NUMBER OF CATEGORIES. 340 

c 350 

C AA*A** ** %\*kAkAhkAkAk*AAAkAAhA*A*khkkik M**»»*»* »»* **»*»J|»»*. r *» » *A*A*»Ali* 360 

C 370 

DIMENSION TITLE (20) ,L(17) 380 

DOUBLE PRECISION A,B,F 390 

1 READf5,100.EHD-99) TITLE 400 

100 FORMAT (20A4) 410 

WRITE(6,200) TITLE 420 

200 FORMAT('l' ///// /T10, 'ESTIMATES OF DECISION RELIABILITY'/ 430 

* T10 , 'AND THEIR STANDARD ERRORS IN'/ 440 

* TiO /MASTERY TESTING BASED ON THE BETA- 7 450 

* T10 , ' BINOMIAL MODEL' / 460 

* T10 ( 'TITLE OF THIS JOB IS: 1 / 470 

* T10.20A4/) 480 
READ(5 I 105)N I M I K I XBAR I SD 490 

105 F0RM*T(3I5 , 2F10 . 5) . 500 



ERIC 



187 iSl 



HITYNH 



KM1-K-1 510 

READ(5,110) (LCD.I-l.KMl) 520 

110 FORMAT (1615) 530 

WRITEC6.205) N,M,XBAR,SD,K 540 

205 FORMAT (TlO, 'INPlbl DATA ARE: 7/ 550 

* TlO, 'NUMBER OF ITEMS - ',14/ 560 

* TlO, 'NUMBER OF SUBJECTS « ' ,14/ 570 

* TlO, 'MEAN OF TEST SCORE - \F10.5/ 580 

* TlO, 'STANDARD DEVIATION OF TESi SCORE - ' ,F10.5/ 590 

* TlO, 'NUMBER OF CATEGORIES - \I4) 600 
IFCK.EQ.2) WRITE(6,206) L(l) 610 

206 FORMAT (TlO /CUTOFF SCORE - ' ,14) 620 

IFOC.GT.2) WRITER, 207) (Lm,I-l,KMl) 630 

207 FORMAT (TlO. 'CUTOFF SCORES - ',I4,I6I5> 640 

F-N/(N-l.)*a.-XBAR*fN-XBAR)/fN*SD**2)) 650 

IFfF.GT.O.) GOTO 5 660 

ViTE (6,210) 670 

210 FOKMAT(/T10, 'NON-POSTTIVE ESTIMATE KR21.7 680 

* TlO. 'MOMENT ESTIMATES FOR ALPHA AND BETA DO NOT EXIST. '/ 690 

* T10, 'COMPUTATIONS DISCONTINUE!' itflt THIS CASE.') 700 
WO 1 710 

5 A-(-l.+l./F)*X3AR 720 

A-N/F-N 730 

CALL KAPPA(N,A,B,K,L,M,XP,S XK.SDK) 740 

WRITPf6,215) A,B,F,XP,SDP,XK,SDK 750 

215 FORMAVC/T10. 'OUTPUT DATA ARE: 7/ 760 

* T10, 'ALPHA - '.F10.5/ 770 

* TlO . ' BETA - '.F10.5/ 780 

* T10,'KR21 - \F10.5// 790 

* TlO, 'RAW AGREEMENT INDEX P - \F8.5/ 800 

* TlO , ' STANDARD ERROR OF P. . - \F8.5// 810 

* TlO , ' KAPPA INDEX - \F8.5/ 820 

* TlO, 'STATDARD ERROR OF KAPPA - ' ,F8.5) 830 
WRITE (6 , 220) 840 

220 FORMAT('0',//.T7,'** NORMAL END FOR THIS JOB **'/ 850 

* TlO, 'PROGRAM WRITTEN BY HUYNH HUYNH7 860 

* TlO. 'COLLEGE OF EDUCATION 7 870 

* TlO. 'UNIVERSITY OF SOUTH CAROLINA 7 880 

* TlO. 'COLUMBIA, SOUTH CAROLINA 292087 890 

* TlO, 'REVISED, DECEMBER 1979') 900 
GOTO 1 910 

99 STOP 920 

END 930 

SUBROUTINE KAPPAfN,A,B,K,L,M,XP,SL ,XK,SDK) 940 

DIMENSION FC61) ,CFf61) ,XAf61) ,XB^61) ,Ltt) 950 
DOUBLE PRECISION A,B,F,CF,XA,XB,P,PC.A1,A2,A3.VA.VB,VAB.TW0,VKP, 960 

* VP , DP A , DPB , DPCA , DPCB , BFZ , DBFA , DBFB , DSA , DSB , SUMBF 970 
TWO-2.D0 980 

C 9*0 

LOO-N+1 1000 
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C 1010 

CALL NEHYfN,A,B,F,CF) 1020 

CALL VARAB(N,A,B,VA,VB/ aB ,M,F ,XA,XB) 1030 

CALL ZERLAB (N,A,B,XA,Xfi,F'j 1040 

C 1050 

pocFaa>^**2 1060 

DPCA-TWO*CFaa> )*XAa(l> > 1070 

DPCB-TWO*CF(La> >*XB(L(1) > 1080 

C 1090 

DO 5 1-2 ,K 1100 

im-i-i mo 

Ai-CFaa)>-CF(L^lMl>) 1120 

POPC+Al*Al 1130 

DPCA-DPCAH-TWO*Al*(XAa(I> )-XA(L(IMl> ) ) 1140 

5 DPCB-DPCB+TW0*A1* (XBCL(I) )-XB(L(IMl) ) ) 1150 
C 1160 

IF(K.GT.2) GOTO 9 1170 

C 1180 

C OTHERWISE THERE ARE TWO CATEGORIES. 1190 

C 1200 

ICUT-LCD-l 1210 

IF(2*L(1).LE.N) GOTO 6 122C 

ICUT-N-L(l) 1230 

CALL BF (N,0,ICUT,B,A,BFZ»DBFB,DBFA,DSB,DSA,SUMBF) 1240 

A1-CF(L(2)>-CF(L(D> 1250 

P«1.D0-2.0*(A1-SUMBF) 1260 

DP A- - 2 . DO* (XA (L (2 ) -XA (L ( 1 ) ) - DSA) 1270 

DPB-- 2 . DO* (XB (L (2) ) -XB a ( 1) ) -DTB) 1280 

GOTO 15 1290 

C 1300 

6 CALL BF(N,0,ICUT,A,B," 'Z»DBFA,DBFB,DSA,DSB,SUMBF) 1310 
A1-CF(L(1>) 1320 ' 
P-1.D0-2. DO* (Al-SUMBF) 1330 
DP A— 2 . DO* (XA CL (1) ) -DSA) 1340 
DPB« - 2 . DO* (XB (L(l> > -DSB) 1350 
GOTO 15 1360 

C 1370 

9 DPA-O.DO 1380 

DPB-O.DO 1390 

P-O.DO 1400 

C 1410 

DO 10 I-l.K 1420 

LL-0 1430 

IF(I.GT.l) LL-L(I-1> 1^0 

LU-L(I)-1 1450 

CALL BF (N , LL » LU , A , B , BFZ ( DBFA , DBFB , DSA , DSB , SUMBF ) 1460 

P-P+SUMBF 1470 

DPA-DPA+DSA 1480 

10 DPB-DPB+DSB 1490 

C 1500 
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1510 

15 Al-1, DO-PC 1520 

A2*1.D0-P 1530 

A>Al*Al v , m 1540 

DKA- (DPA*A1-DPCA*A2) / A3 T550 
DKB-(DPB*Al-DPCB*A2) /A3 

C VKP-VA*DKA**2+VB*DKB**2+2*VAB*DKA*DKB 1570 
VP-VA*DPA**2+VB*DPB**2+2*VAB*DPA*DPB 

SDK-VKP**.5 1600 

XP-P 1610 

SDP-VP**.b 1620 

XK-(P-P-:)/Al I630 

C 1640 

RETURN 1650 

SUBEDIT "NE NEHY(N,A»B,F»CF) j? 70 

DIKEN^xON ?U),CF(1) 1680 

DOUBLE PRECISION A,B t F,CF,Zl ,22 *£} 0 

Z1-DFLQAT(N)+B l 70 o 

Z2-Z1+A 1710 

K-0 1720 

F(1)-1.D0 1730 

5 F(l)-F "(1)2(Z1-DFL0AT (1) ) / (Z2-DFIAAT(I) ) J™ 

10 KP1-K+1 1760 

K-K+l 1800 

IF(K-N) 10.15.15 !8lo 

15 CF(1)-7(D 1820 

DO 20 I-i.N 1830 

IP1-I+1 1840 

20 CF(IP1)-CF(I)+F(IP1) 1850 

25 RETURN I860 

^DOUBLE PRECISIOH A. J.^^^^^S^'.SS' ISIo 

N2-N+N 1910 

IR-LU-U+1 1920 

DN-DFLQ*T(N) x cj0 

Zl»DFLOAT(N2)+3 1940 

ZUtt-Zl-l.DO 1950 

Z2-Z1+A I960 
DLL-DFLOAT ( !970 

C 1980 

IF(LL.NE.O) GOTO 10 1990 

c , M 2000 
AA-1.D0 
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XA-O.DO 

XB-O.DO 2010 

2020 

DO 5 I-1.N2 2030 
T-DFLQAT(I) 2P40 
AA-AA*<'Z1-TVCZ2-T^ 2050 
XA-XA-1.D0/(Z2-T^ 2060 
5 XB-XB+1.D0/CZ1-T) 2070 

2080 

XB-XB+XA 2090 

2100 

GOTO 15 2110 

2120 

10 X-DLL-l.DO 2130 
Y-DLL-l.DO 2140 

XB-DBFB- 1 . DO/ (Z1M1-X-Y) 21 70 

2180 

X-LL 2190 

XB-XB-1. DO/ (ZlMl-X-Y) 2220 

2230 

15 SUMBF-AA 2240 
DSA-XA*AA 2250 
DSB-XB*AA 2260 

2270 

IF(IR.EQ.l) GOTO 90 2280 

2290 

AAHOLD-AA 2300 
XAHOLD-XA 2310 
XBHOLD-XB 2320 

2330 

DO 50 1-2, IR 2340 
X-DLL+DFL0AT(I-2) 2350 
Y-DLL 2360 

XB-XBHOLD-1 .DO/ (ZlMl-X-Y) 2390 

DSA-DSA+2 . DO*XA*AA 2410 
DSB-DS i+2 . D0*XB*AA 2420 
SUMBF-SUMBF+2 . DO*AA 2430 

7440 

AAHOLD-AA 2450 
XAHOLD-XA 2460 
XBHOLD-XB 2470 

2480 

X-X+l.DO 2490 

2500 
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2720 
2730 
2740 



DO 50 J-2,I 2510 

Y-DLL+DFLOAT (J) -2. DO 2520 

C 2530 

XB-XB- 1 . DO/ (Z1M1-X-Y) 256' 

^ 2570 

IF(I.EQ.J) GOTO 40 AO 

SUMBF-SUMUF+2 . D0*AA 2590 

DSA-DSA+2 . D0*XA*AA 2600 

0SB«DSl/f 2 . DO*XB*AA 2610 

GOTO 50 262U 

C 26^0 

40 SUMBF-SUMBF+M 2640 

DSA-DSA+XA*AA 2650 

DSB-DSB+XB*M StfiO 

50 CONTINUE 2670 

0 90BFZ-AA 

DBFA-XA 2700 

DBFB-XB 2710 

RETURN 
END 

SUBROUTINE ZERLAB (N , A , B , XA , XB , F) 2750 

DIMENSION XA(1) ,XB(1) ,F(1) 2760 

DOUBLE PRECISION A.B.Zl ^.XA.XB.F.ONE 2770 

ONE-l.DO 2780 

£ 2790 

XA(1)-0.D0 ?S 10 

XB(1)-0.D0 28*0 

Z1-DFL0AT(N)+B fa™ 

Z2-Z1+A 2840 

NP1-N+1 joen 

DO b I-l.N fill 

XA(l)-XA(l)-01JE/rZ2-DFLOAT(I)) 2870 

5 XBa>-XBU>40NE/(Zl-DFLOATtt)> ™L 

XBa)-XB(l)+XAfl) 2890 

?°10I-1,N gS 

ll 1 ' 1 * 1 2910 

aX-I-1 2920 

XAfIPl)-XAfI)-K)NE/fA+DFLOATflX)) 2930 

10 XBttPl)-XBfI)-ONE/(Zl-DFLOATU)) 2940 

XAQ)-XA(l)*Fa> 2O50 

XBa>-XB(l)*F(l) 2*60 

DO 30 1-2, NP1 2970 

IM1-I-1 ;2oo 

XAa)-xArnauj{A(i)*?(i) $lr 0 

30 :-3(I)-XB;iMl)+XB(I)*Fa) 3600 



RETURN 
END 

SUBR0UTI»£ VARAB(N,A,B,VA,VB,VAB,M F DA DB) 
DIMENSION FU),DA(1),DB<1) 

DOUBLE PRECISION A,B,DA,DB,F,B11,B12,B22,D,VA VB \AB 
CALL DEJiLAB (N,A,B,DA,DB) ' ' 

Bll-U. 
B12-0. 
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B22-0.DC 
NP1-N+1 
DO 15 1-1, N*l 
Bll-Bll+DAa)*DAtt)*Fri) 
B12-B12+DA m *DB a v *F ( I) 
15 B22-B22+DB(I)*DBa)*Fa) 
B11-B11*M 
B12-B12*M 
B22-B22*M 
D-B11*B22-B12*B12 
VA-B22/D 
VB-B11/D 
VAB-- B12/D 
RETURN 
END 

SUBROUTINE DERLABfN, A, B,DA,DB) 

DIMENSION DA'l) f DBCl) 

DOUBLE PRECISION A,B,DA,DB,Z1 .Z2 

DOUBLE PRECISION ONE 

ONE- 1. DO 

DAa)-0.D0 

DB'D-O.DO 

Zl-DFLOAT(N)+B 

Z2-Z1+A 

NP1-H+1 

C 

DO 5 1-1,21 

DAQWDAQ) -ONE/ (22-DFLOAT^I) ) 
5 DBQ)-DB <l)+ONE/ CZI-DFLOAT(I) ) 
DBa)«DBa)+DAri) 

C 

DO 10 I-1,N 

IP1-I+1 

IX-I-1 

DA ( IP 1 ) "DA ( I ) +ONE/ ( A+DFLOAT * IX) ) 

10 DBriPD-DBfD-ONE/CZl-DFLOATa)) 

RETURN 
END 



3100 
3110 
3120 
3130 
3140 
3150 
3160 
3170 
3180 
3190 
3200 
3210 
3220 
3230 
3240 
325Q 
3260 
3270 
3280 
32SO 
3300 
3310 
3320 
3330 
3340 
3350 
3360 
3370 
3380 
3390 
3400 
3410 
3420 
3430 
3440 
3450 
3460 
3470 
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ACCURACY OF TWO PROCEDURES FOR ESTIMATING 
RELIABILITY OF MASTERY TESTS 



Huynh Huynh 
Joseph C. Saunders 

University of South Carolina 

Presented at the annual conference of the Eastern Educational 
Research Association, Kiawah island, South Carolina, February 22-24, 
1*19. A short version of this paper will appear in Journal of 
Educational Measurement (in press) . 

ABSTRACT 

Single administration (beta-binomial) estimates for the raw 
agreement index p and the corrected-for-chanc<i kappa index in 
mastery testing are compared with those based on repeated test 
administrations in terms of estimation bias and sampling variabil- 
ity. Across a variety c " test score distributions, teat lengths, 
and mastery (cutoff) scores, the beta-binomirl estimates tend to 
underestimate the corresponding population values. The percent of 
bias is small (about I.Ya) and p and somewhat larger (about 10%) 
for kappa. Both beta-binomial estirates have standard errors about 
one-half the size of the standard errors of estimates based on 
repeated test admin is t rat ions. Though «;he beta-binomial estates 
presume equality of item difficulty, the data presented indicate 
that even gross departures from equality of item difficulty do not 
affect the amount of bias of the estimates, 



This paper has been distributed separately as RM 79-1, February 
1979. 
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1 > INTRODUCTION 

In mastery testing reliability is urten viewed as the consis- 
tency of mastery-nonmastery decisions across repeated test adminis- 
trations (Huynh, 1976, 1978a; Subkoviak, 1976). Two reliability 
indices have been proposed and studied for mastery tests. They are 
the raw agreement index p and the corrected-f or-chance kappa index 
(<). The first index represents the proportion of examinees 
consistently classified in the same (mastery or nonmastery) category 
over two test administrations using the same form or two equivalent 
forms. It is assumed, of course, that the first testing does not 
induce any lasting change in the examinees. The second index, 
kappa, is defined as k ■ (p-P c )/(l-p c ) , where p c is the proportion 
of consistent classification expected under complete random assign- 
ment. Thus kappa reflects the extent to which test scores wi]l 
improve the consistency of decisions beyond the level expected 'y 
random classification. The relationship between kappa and other 
parameters such as cutoff score and classical test reliability may 
be found in Huynh (1978a). 

The definitions of Doth p .nd kappa assume the feasibility of 
repeated te c t administrations. This may not be practical in many 
instances. Under some conditions, p and kappa may be approximated 
from a sjngle test administration. There are at least two proce- 
dures to accomplish this, namely, those described in Huynh (1976) 
and Subkoviak (1976}. The Huynh procedure assumes that the test 
scores are distributed as predicted by a univariate or bivariate 
beta-binomial model. On the other hand, the Subkoviak technique, 
ii its simplest form, assiunes that test scores are distributed as 
predicted by a binomial distribution and that the regression of 
true score on observed test score is linear. 

Subkoviak (1978) has provided a comparison of these two 
procedures using simulations with fifty repetitions. The data 
reported in Table 2 of his paper clearly indicate that both proce- 
dures act almost identically ir. terms of estimation bias and 
standard error. This is an expected result. Linear regression of 
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true score on observed score in the binomial error model automatically 
implies that the test score distribution under study must belong to 
the negative hypergeometric (beta-binomial^ family (Lord & Novick, 
1968, p. 516). Hence it appears that the conditions underlying the 
Subkoviak procedure are those of the beta-binomial distributio 
assumed in Huynh's paper (1976). For tUs reason and for inherent 
complexities in formulating inferential techniques associated with 
the Subkoviak procedure, this paper will be restricted to the beta- 
binomial model in the estimation of reliability for mastery tests. 

The purpose of this paper is to compare th~ accuracy of two 
procedures for estimating reliability of decisions in mastery test- 
ing. One procedure is based on two test administrations; the other 
procedure relies on only one test administration and performs all 
computations assuming the appropriateness of the b«_ta-binomial model 
for the test data under study. Sections 2, 3, and 4 dea 1 wi^h the 
asymptotic (large sample of examinees) nature of the estimates 
Section 5 reports a simulation study for the case of small samples. 

2. ASYMPTOTIC BIAS AND STANDARD ERRORS 

Though the number of classification categories may be arbitrary, 
we will consider only the case of two categories, labeled mastery 
ard nonmastery. The lowest score for vhich an examinee *;ill be 
classified as a master will be referred to as the mastery (or pass- 
ing) score in subsequent discussion. 

First let us consider estimating p and k by testing a sample 
of m examinees twice. Let p^ be the proportion of examinees clas- 
sified in the i-th category on the first testing and in the j-th 
category in the second testing. Here let i - 0 for a nonmaster and 
i » 1 for a master. Let the dot (.) bear the regular summation 
meaning. For example, the marginal proportion of masters on the 
first testing is Pl - p 1Q + p 

The observed proportion* of consistent classifications in the 
sample at hand is p R - p 00 + and the kappa index for this sample is 

* "~ 

The subscript R means repeated testings. ' <~ - 

i \) 
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* R - (P R -P C )/(1"P C ) (1) 

where p c * P 0 . P .o + P l. p .l* Under random sampling, p R is an effi - 
cient statistic for the parameter p (Hogg & Craig, 1970, p. 372). 
In other words, p R is unbiased and its standard error is equal to 
the Rao-Cramer lowsr bound. This standard error is (p(l-p) /m)* 5 . 
It may also be noted that p R is also the maximum likelihood (ML) 
estimate of the population value of p and that ic D is an ML estimate 
of the population value of k. Its asymptotic (large sample) prop- 
erties are well known. For example, iL follows an ?pproximate 

K 

normal distribution with mean k and with a variance of 



1 
m 



p(l-p) 2' V 1-P)(2pp -a) (l-p) 2 (b-4p 2 ' 



+ ^— + 



(1-P c ) 2 (l-p c ) 3 (l-p c ) 4 



(2) 



where 



a = Poo ( P 0 . + P.o ) + P 11 (P 1. + P.l> < 3 > 



and 

k = v v ij^j. t p.i 



s p^(p^. + ? y (4) 

if j 



(Bishop et al., 1974, p. 396). In these formulae, all quantities 

listed are population values. When sample proportions are used in 

(2), the resulting value is an estimate for the variance of k . 

R 

Finally, since the asymptotic mean of ic D is k, iL is asymptotically 

R R 
an unbiased estimate for this parameter. 

Consider now estimating p and k from a single test administra- 
tion. The estimates*, p fi and ic fi , are described in detail in Huynh 
(1976); the asymptotic standard errors of both estimates may be 
obtained via the formulae, tables, or computer program described 
elsewhere (Huynh, 1978b). In the latter paper it is also shown 
that p fi and K fi are asymptotically unbiased estimates of p and k. 

3. A COMPARISON OF THE ASYMPTOTIC STANDARD ERRORS 
OF ESTIMATE FOR BETA-BINOMIAL TEST DATA 

Whether estimation is based on repeated or single testings, 
rfa times the standard error (S.E.) of the estimate is (or is 



The subscript B refers to the beta-binomial model. 
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asymptotically) not a function of the sample size m. Thus m is not 
a significant factor in any comparison of the estimates as long as 
sufficiently large samples are to be consideiod. In this section 
and most subsequent onos, onJv the quantity G - vfiT x S.E. will be 
considered. 

The comparisons described in this section are limited to test 
score distributions that follow the beta-binomial distribution. 
Strictly speaking, the procedure for estimating from a single 
administration (Huynh, 1976) is formulated only for this type of 
data. 

The comparison was made for selected situations with n = 5, 
10, 20, and 30 test items. 7 % e test mean (y) and KR21 reliability 
(a 21 ) were chosen such that the resulting test score distribution 
would be one of the following types: (i) U-shaped with the higher- 
density mode at the upper end of the score range, (ii) symmetric » 
(iii) unimodal with a mode somewhere between \i and n, cr (iv) J- 
shaped. The passing score c was chosen such that the ratio c/n 
would be 60, 70, or 80%. The G-values for k d were computed via 
Equations (2), (3), and (4) uith the p^ proportions generated by 
the bivariate beta-binomial model. The G-value3 for p and k were 
obtained via the computer program described in Huynh (1978b). 

Table 1 reports the obtained G-values wl en the two procedures 
for estimating p and k are used. The G-values in the table clearly 
demonstrate that the standard error associated with the single 
administration (beta-binomial) procedure is uniformly smaller than 
that encountered with the procedure using two test administrations. 
Over the thirteen situations reported in Table 1, the standard 
errors for the single administration procedure average 59.3% of 
those from repeated administrations for the p index and 53.2% for 
the kappa index. 

A. A COMPARISON OF THE ASYMPTOTIC BIAS AND 
STANDARD ERRORS OF ESTIMATE FOR CTBS TEST DATA 

This phase of the study is motivated by the fact that real 
test data rarely conform exactly to a well-specified model such as 
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G-Values for Beta-Binomial Test Data 
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the beta-binomial distribution. It is based on a portion of the 
Comprehensive Tests of Basic Skills (CTBS) test data collected in 
the 1978 South Carolina Statewide Testing Program. Table 2 
describes the various tests artificially assembled from CTES sub- 
tests or from the entire battery. For each test in the listing, 
two alternate (hopefully equivalent) forms were created by pairing 
items on the basis of content and/or difficulty and randomly assign- 
ing thi items in each pair to the alternate forms. For reasons 
which will be obvious later on, a number of tests were deliberately 
constructed of items of similar difficulty. 

The number of items (n) was set at 5, 10, 15, and 20. The 
number of students, selected by taking every tenth case from the 
entire South Carolina file, ranged from m «= 1684 to 6035. For each 
test, the value represents the maximum discrepancy between the 

observed relative cumulative frequency and the corresponding ex- 
pected frequency from the beta-binomial model. A significance level 
(P-value) of more than .20 indicates that the test data follow 
closely the beta-binomial distribution. On the other hand, P-values 
of less than .05 or .01 reveal substantial departures from the 
theoretical distribution. 

For each test described in Table 2, the population values p , 

a * R 

G(p R ), k r , and G(k r ) were computed using the bivariate frequency 

distribution generated by the alternate forms. The corresponding 

parameters p fi , G(p fi ), and G(tc fi ) were obtained by imposing the 

beta-binomial model on each of the two alternate forms and averag- 

ing the two sets of results. Now both p- and k 0 are asymptotic 

B B 

unbiased estimates of p fi and tc fi (Huynh, 1978b). Also, since p R is 
an unbiased estimate of p R , and tc^ is an asynytotica? ly unbiased 
estimate of k r , only the asymptotic bias of p^ and tc fi in estimating 
p R and k r was explored. Thus, it follows that the percent asymp- 
totic bias for Pp and K g is 100 ( Pfi - P R )/p R and 100 (* B ~ * r )/k r » 
respectively. A negative bias indicates underestimation whereas a 

positive bias documents an overestimation. (We focused on p and 

R 

k r because tost reliability is typically approached from the stand- 
point of equivalent forms.) All computations reported in this 
q section were ca^ri^d at as in the previous section. 
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TABLE 2 

Description of the CTBS Data Used in Sections 4 and 5 



Case 


n 


M 


diff 


D 

max 

w 


P-value 


Giade 


Description 


5.1 


5 


1684 


.056 


1.80 


>.20 


3 


"eading comprehension 
















(paragraph) 


j. l 


c 

J 


looA 


. 10/ 


0.68 


>.20 


3 


Language expression 


5.3 


5 


5543 


.003 


0.50 


>.20 


3 


Total battery 


J0.1 


10 


1684 


.060 


2.24 


>.20 


3 


Reading comprehension 
















(sentences) 


10.2 


1C 


6035 


.081 


1.54 


>.15 


6 


Reading vocabulary 


10.3 


10 


5543 


.007 


2.02 


<.05 


3 


Total battery 


15.1 


15 


1684 


.175 


1.72 


>.20 


3 


Science 


15. A 


15 


1335 


.022 


3.85 


<.05 


6 


Total battery 


20.1 


20 


1684 


.099 


4.01 


<.01 


3 


Mathematics 


20.3 


20 


5543 


.015 


7.65 


<.01 


3 


Total battery 
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Table 3 details the results of the various estimates for p D 

A R 

and k . The data indicate that the beta-binomial estimates (p_ and 

A I\ jj 

k r ) tend to underestimate the alternate-form population values. 
For the p index, the percent of bias ranges from -4.2 to 0.1 with 
an average of -2.3. A larger degree of bias, however, occurs in 

A 

the estimation of kappa via k r . The percent cf bias for this esti- 
mate ranges from -17.5 to 0.9 with an average of about -7.8. 

A A 

The larger bias of ic as compared with that of p« is to be 
expected. With the factor 1 - p (which cannot exceed .50) in the 

A 

denominator of Equation (1) defining kappa* the bias of k d is at 

A B 

least twice as large as that associated with p D . For situations in 

o 

which a high proportion of examinees are to be classified either as 
msters or nonmasters, 1 - p is close to zero. As a consequence, 

A C 

the bias of will beccme more pronounced in those cases. 

The beta-binomial model assumes that test items are equally 

difficult (Huynh, 1976). It would be natural to expect that the 

bias of the beta-binomial estimates would bear a positive (or 

direct) relationship with variation in item difficulty. This is 

not the case, however. The values of D in Table 2 clearly 
max 

indicate that departures from the beta-binomial distribution show 
no resemblance to the standard deviation (^ diff ) of Item difficulty. 
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TABLE 3 

Percent Asymptotic Bias and G- Values for CTBS Test Data 



Case 




Cut or r 




Index p 






Kappa 




Of 

z 

O Ida 


G(pJ 

D 


G(P_) 
R 


% 

Bias 


GOO 
B 


r' 


5.1 


5 


3 


-1.5 


.174 


.331 


-7.1 


.540 


401 






4 


-3.5 


. 236 


.350 


-9 2 


AftS 

. HO J 


77 A 


5.2 


5 


3 


-2.6 


.192 


.348 


-13.7 


.664 




5.3 




4 


-4.7 


.287 


.391 


-14.1 


.593 


.856 


5 


3 


-2.8 


.211 


.364 


-17.5 


.734 


1.148 






4 


-3.4 


. 325 


.429 


-11.3 


.667 


.921 


10.1 


10 


h 


-2.9 


.113 


.256 


-10.2 


.329 


fifiR 
• ooo 


10.2 




8 


-4.2 


.147 


.281 


-9.7 


294 


AHA 


10 


6 


-1.3 


.136 


.330 


-5.3 


.384 


.832 


10.3 




8 


-3.6 


.176 


.347 


-8.7 


.345 


.707 


10 


6 


0.7 


.136 


.332 


2.5 


.537 


1.165 






8 


-1.2 


.208 


.385 


-4.4 


.441 


.862 


15.1 


15 


9 


-2.6 


.203 


.403 


-8.1 


.407 


.809 


15.2 




13 


-3.7 


.164 


.317 


-7.6 


.530 


1.300 


15 


9 


-1.9 


.168 


.393 


-4.0 


.351 


.881 






13 


-0.4 


.141 


.295 


-7.1 


.506 


1.313 


20.1 


20 


12 


-2.7 


.098 


.241 


-12.9 


.412 


1.040 


20.2 




14 


-2.8 


.115 


.292 


-7.7 


.353 


.880 


20 


12 


0.1 


.132 


.370 


0.9 


.267 


.751 






14 


-0.7 


.121 


.355 


0.0 


.283 


.805 



The same observation holds for the bias of p fi and k r as displayed 
ir Table 3. 

The G-values of Table 3 clearly show that the estimates based 
on the beta-binomial model have a smaller standard error of esti- 
mate than those based on alternate forms. Over all the situations 
considered, the standard error of p is about 50.4% of that of p ; 

A " A R 

the standard error of k r is about 50.2% of that of k„. These 

t> R 
results are consistent with those of Section 3. 

5. A COMPARISON OF riNITE-S/ * ^E BIAS AND STANDARD 
ERRORS OF ESTIMATE FOR'CTBS TEST DATA 

A simulation was conducted to study the sampling fluctuations 
of the estimates p R , k^ 9 and when sample sizes are of small or 
moderate size. This was dene for samples of size m = 20, 40, and 
60. For each test, one thousand replications weie usad to obtain 
the observed percent of bias and C-value for k r . As for estimates 
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based on the beta-binomial model, one thousand replications were 
s. Jiulated for each alternate form and the averages of the two sets 
of results were used to determine the bias and G-value for p and 

A B 

V 

Table 4 presents a summary of the results of simulation. The 
adequacy of the random number generator (more specifically, the 
IMSL (1977) subroutine GGUB) is documented by the near zero bias 
of p R and the small fluctuation of the G(p ) values for various 
sample sizes around the corresponding true values (enclosed in 
parentheses). The data reported in the table clearly show that, 
as in the case of large samples, the beta-binomial model tends to 
underestimate the parameters p R and k r . The bias of p fi in estimat- 
ing p averages -2.6%. For kappa, the bias of iL fluctuates around 
-11.0%. It is also interesting to note that the alternate form 
estimate, k r , also tends to have a small negative bias. 

TABLE 4 

Percent Finite-Sample Bias and G-Values for CTBS Test Data 



Case 


n 


Cutoff 
Score m 


P B 




P R 




K B 




K R 


% 

Bias G(p B ) 


% 

Bias 


G(P R ) 


% 

Bias 


G(k b ) 


% 

Bias 


G(K R ) 


5.1 


5 


3 


20 


-0.5 .186 


-.4 


.325 


-8.6 


.617 


+1.5 


1.005 








40 


-0.1 .184 


-.1 


.335 


-7.7 


.569 


-1.3 


.936 








60 


-1.1 188 


-.1 


.334 


-7.4 


.553 


-0.3 


.930 










(Exact value 


0 


.331) 










10.1 


10 


7 


20 


-3.6 .141 


-.1 


.225 


-11.9 


.376 


-1.3 


.678 








40 


-3.9 .146 


.2 


.269 


-11.6 


.327 


-1.2 


.644 








63 


-4.0 .145 


-.1 


.268 


-11.4 


.304 


-0.4 


.625 










(Exact value 


0 


.259) 










15.1 


15 


11 


20 


-3.4 .210 


-.4 


.395 


-15.1 


.543 


-2.4 


.949 








40 


-3.8 .206 


.3 


.402 


-13.4 


.525 


-2.2 


.927 








60 


-3.7 .203 


-.2 


.397 


-13.0 


.523 


-0.1 


.927 










(Exact value 


0 


.392) 










20.1 


20 


14 


20 


-0.7 .141 


-.2 


.293 


-12.7 


.585 


-5.0 


1.017 








40 


-2.6 .137 


0 


.306 


-10.2 


.519 


-1.3 


.961 








oO 


-2.6 .142 


.2 


.312 


-9.2 


.499 


-2.2 


.942 










(Exact value 


0 


.292) 
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The data in Table A shov that the beta-binomial estimates have 
smaller sampling fluctuations than the alternate form estimates. 
For all situations reported in this table, the standard error of p fi 
is about 51.4% of that of p ; and the standard error of IL is about 
56.9% of the standard error of k r . These trends are very similar 
to those reported in the previous section. 

6. DISCUSSION AND CONCLUSI ON 

In this study the performance of a single administration esti- 
mate of reliability for mastery tests is compared with the behavior 
of the estimate based on two test administrations. The results 
clearly indicate that the single administration (beta-binomial) 
estimate for the raw agreement index p behaves very well. Not only 
does it show a negligible amount of negative bias, its sampling 
error is about half or that of the test-ret est procedure. As for 
the kappa index, a moderate degree of negative bias (about ten per- 
cent) is displayed by the -eta-binomial estimate . This estimate of 
kappa also has a standarJ error that is about one-half the corre- 
sponding value for the alternate form estimate. Though the beta- 
binomial estimates are originally derived for tests with items of 
equal difficulty, the data presented indicate that the bias of 
these estimates does not depend on the assumption of equal diffi- 
culty for test items. Our conclusion is that for testing situa- 
tions involving tests like the CTBS (with items of a wide range of 
difficulty), the estimation for consistency of decisions in mastery 
tests may be safely carried out via one test administration with 
the beta-binomia] model as a vehicle for computation. 
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AN APPROXIMATION TO THE TRUE ABILITY 
DISTRIBUTION IN THE BINOMIAL ERROR MODEL 
AND APPLICATIONS 



Huynh Huynh 
Garrett K. Mandeville 
University of South Carolina 



ABSTRACT 

Assuming that the density p of the true ability 9 in the 
binomial test score model is continuous in the closad interval 
[0,1], a Bernstein polynomial can bemused to uniformly approximate 
p. Then via quadratic programming techniques, least-square esti- 
mates may be obtained for the coefficients defining the polynomial. 
The approximation, in turn will yield estimates for any indices 
based on the univariate and/or bivariate density function associa- 
ted with the binomial test score model. Numerical illustration 
are provided for the projection of decision reliability and pro- 
portion of success in mastery testing. 



1. INTRODUCTION 

The binomial error model (Lord and Novick, 1968) has been used 
extensi- ely in analyses of mental test data. The model is deemed 
suitable in computer-assisted testing in which each examinee is 

This paper has been distributed separately as RM 79-5, June, 1979. 
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given a random sample of items drawn from a large item universe. 
When the same test is given to all examinees, the binomial distri- 
bution implies that all items share the same difficulty level. 
There are indications (Keats and Lord, 1962; Duncan, 1974) that 
several test score distributions based on the same test fit the 
binomial (or more specifically the beta-binomial) model quite well, 
especially when similarity of item difficulty holds strictly or 
nearly. Let x denote the test score obtained from the administra- 
tion of an n-itera test to an examinee with true ability 0 (the pro- 
portion of items in the universe that he/she knows, or the probabi- 
lity of answering each item correctly). Then the conditional density 
of x given 6 is 



Let p(0) be the density of the true ability for a population of 
examinees. The marginal density of x for this population is given 
as 



As indicated in Lord and Novick (1968; Chapter 23), the knowledge 
of f (x) implies the knowledge of the first n moments of the distri- 
bution of e. Any distribution sharing these n moments will yield 
the same marginal density f(x), hence the solution for p(G) given 
f (x) is not unique. We will seek an approximation for p(6) via a 
polynomial and wiJ^ show how such approximation is useful in the 
projection of decision reliability and proportion of successes in 
mastery testing. 

2. A SOLUTION BASED ON THE BERNSTEIN POLYNOMIAL 

We shall assume that p(0) is continuous in the closed interval 
[0,1]. Then (Feller, 1966, p. 220) p(e) can be uniformly approxi- 
mated by a Bernstein polynomial of the form 



f(x|0) ~ (*) B y (1 - 0) 



n-x 



0,1, . « » ,n« 



f (x) - (£} jj e x (i - e) n " x P (9) ae. 




m-k 
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Thus given any arbitrarily small and positive e, there exists an 
integer m and (m + 1) constants z ± such that |B m (9) - pfe) | < e 
for all 9 e [0,1]. We propose to use B ffl (9j to approximate p(9). 
Procedures will be presented for the determination of the constants 



m « V Z l V 



It may first be noted that the constants must be non-nega- 



tive and satisfy the constraint /* B (9) d9 - 1 in order for B (9) 

v ui tn 



to be a density. Hence 



m 



k=0 



or equivalently 



m 



Z z. - m + 1. 
k»0 K 



The Bernstein approximated value for the marginal density of x is 
now given as 

f B (x) - <*) Z z ( m ) J( n + m; x + k) 
x k-0 * k 

where 

J(n + m; x + k) - /J 9 xfk (1 - 9) n+ ™- (xfk) d9. 

The J integrals may be computed inductively by noting that 
J(p:0) = l/(p + 1) 

and 

J(p;y + 1) = (y + 1) J( P ;y)/(p - y). 
Now let 

c(k,x) = (*)<£)J(x + k) 

and 

a(k,x) = c(k,x) - c(0,x). 
Then the approximated marginal density of x becomes 
m 



f B (x) - ^ Z^ a(k,x) z fc + (m + 1) c (0,x) 



k-1 



where the z fc , k - 1, 2 m are nonnegativ- and sum up to no more 

than m + 1. 
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To determine the constants m, z z of ..», z , we foous on the 
least-square criterion with the weight function w(x) 

n 2 
H(z. f z z m ;m) « Z w(x) [f R (x) - f(x)]'. (1) 

1 2 m x-0 B 

In other words, we will seek these constants iu such a way that the 

H criterion is minimized. This may be done by first considering 

m as fixed and computing the z constants along with the minimum H 

m 

of the criterion H, This process will be repeated many times 

starting with m ■ 0 [p(e) and f_(x) are constant], 1, 2, etc. until 

d 

an integer m can be located at which H is minimized. Following 

m 

are the details for the algorithm, 

2.1 Minimizing H at Each Integer m . Let 

e(x) = <m+ 1) c(0,x) - f(x). 
Then (1) becomes 

n m 0 

H « Z Cw(x) Z a(k,x)z + B(x)r, (2) 
x=0 k-1 

At each given integer m, the nonnegative z^, Z2f..., z ffl may be 
obtained by minimizing H under the constraint Zz^ < m + 1, Since 
H is continuous and the z*s are located in a closed region, the 
solution for z always exists. To obtain such solution, standard 
routines for quadratic programming may be called upon. In this 
paper, Algorithm 431 (Ravindran, 1972) was used. 

To enter into Algorithm 431, we note that the criterion H of 
(2) may be written as 

H » Z'DZ + 2BZ + C. 

In this formula, Z is the vector (Zj, z 2 ,..., z^'. D a < d kk i) *s 
the matrix defined by 

n 2 
d - I w(x) Ca(k,x)]^ 

x-0 



d t - Z w(x) a(k,x) a(k f ,x) 
x-0 
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and B - (b fc ) is a vector with components 
n 

b k - I w(x) a(k,w) $(x). 
x-0 

The remaining quantity C is the constant 

n 2 
C - Z W (x) C(?(x)] 2 . 

x-0 

2.2 Searching the Le ast Square Solution . We note that when m - 0, 
the minimum value H Q of H is simply 

n 

H Q - Z w(x) Cf(x) - f] 2 
x-0 

where 

f - zw(x) f (x)/zw(x). 

As for other m values, the minimum may be deduced from the quadratic 
programming. Thus the least square solution for the Bernstein poly- 
nomial may be obtained by computing H Q , Hj, H 2 ,... for several con- 
secutive values of m, and locating the value of m at which H is the 
smallest. Since the criterion for minimization H is non-negative, 
all computations shall stop whenever H ffl - 0. In other situations, 
a tolerance difference between and might have to be set up 

in order to end the approximation process. 

3. NUMERICAL ILLUSTRATION 

To illustrate the computational algorithm described in the 
previous section, three score frequency distributions based on 
n - 10 test items are used. For Data Set 1, almost all frequencies 
are concentrated at the upper end of the score range. Data Set 2 
is slightly asymmetric and Data Set 3 has two modes, one near each 
end of the score range. Details regarding these data sets are 
presented in Table 1. 

It appears from Table I that the goodness of fit via the 
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Bernstein polynomial improve* when the degree of the polynomial 
increases. For unimodal distributions, the algorithm tends to put 
all the weights at only a few terms which correspond to *jme 



TABLE 1 

Observed and Fitted Frequency Distributions 
for Three Data Sets 



Test Score 


Data 


Set 1 


Data Set 2 


Data 


Set 3 


Observed 


Fitted 


Observed 


Fitted 


Observed 


Fitted 


0 


0 


.00 


0 


.06 


4 


6.09 


1 


0 


.00 


0 


.37 


10 


10.28 


2 


0 


.01 


1 


1.26 


IS 


10.16 


3 


0 


.07 


3 


3.07 


2 


8.68 


4 


1 


.23 


6 


5.97 


6 


9.16 


5 


1 


.69 


10 


9.66 


10 


12.64 


6 


3 


1.82 


13 


13.28 


20 


17.49 


7 


5 


4.42 


16 


15.47 


25 


20.22 


8 


8 


9.93 


15 


14.88 


15 


17.89 


9 


15 


20.96 


11 


11.00 


10 


10.89 


10 


47 


41.91 


5 


4.97 


4 


3.50 


Degree of the 


BernstP 


In 










polynomi al : 




10 




10 




24 


Minimum H : 
m 




.0106 




.0001 




.0052 


The positive z 


constants: 












z 10 ~ 


11.0000 


Z 7 " 


9.891? 


z 4 " 


6. 2830 








Z 8 - 


1.1088 


Z 5 " 


1.3349 



z 1? - 14.4010 
z 10 - 3.9830 



consecutive z ± values. On the other hand, for a bimodal distribu- 
tion such as Data Set 3, the algorithm puts the total weight on 
two blocks, each being formed by some conpccutave s values. 
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4. PROJECTION OF DECISION RELIABILITY 

Consider now two equivalent tests X and Y, each with n items. 
If the test score distributions are binomial, then the bivariate 
density is given as 

f <*.y>- Q(p /J ^ <i - e) 2n -<*^> p( e) d e. 

Let the density p be approximated from the data collected with one 
test as 

B * r) " " \ <I> ^ (1 - 9 >- k . 
k-0 k k 

Then f (x,y) will be given by the expression 

m 

I 

k-0 



f B (x,y) - l « k (J) J(2n + m; x + y + k) 



where the function J is defined as previously in Section 2. The 
expressions for f fi (x) and f B (x,y) may now be Msed to project prac- 
tically all agreement indices for decisions in mastery testing. 
Let the examinees now be classified in k categories A ± defined by 
A i " {x;c i-l 1 x < c i> where c Q - 0 and c k - n + 1. For binary 
classifications k - 2. In this case Cj is usually referred to as 
the cutoff (mastery) score . The raw agreement index 
m 

P - E P [ (X,Y) e A. x A . ] 
i-1 1 1 

can be computed by the formula 
k 

p = E £ 1 f B (x,y)]. 

i-1 i-l<x,y<c a 

On the other hand, the corrected-for-chance kappa index is given as 
k - (P - P c )/(1 - P c ) where 

p c " 1 c c 1 f B <x)r. 

i-1 C i-l<x<c B 
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4.1 Numerical Example, Consider the case where n ■ 5, d» ■ 4 and 
Zq • 1.0, * 1.5, z 2 » 2.0, z 3 « 0 and z^ » .5. The Bernstein 
polynomial generates the marginal frequency density of .20040, 
•21230, .20040, .16865, .12698 and .09127 at the test scores of 
0» 1» 2, 3, 4, and 5. For the binary classifications with cutoff 
score 4, the raw agreement index is .8197 and the kappa index is 
.4716. 

5. PROJECTION OF TEST SCORE DISTRIBUTIONS 
FOR LENGTHENED TESTS 

There are situations in which a test needs to be lengthened in 
order to accomodate new conditions and data are available for the 
short version of the test. If the binomial model holds, then it is 
possible to project the test score distribution for a lengthened 
test, assuming that the ability distribution of the examinees re- 
mains unchanged. From the data for the short form, it may be possi- 
ble to approximate the true ability distribution via the Bernstein 
polynomial 

B m (6) = " z wO ek " a)"" 11 - 

m k=»0 k k 

For a lengthened test consisting of I items, the projected density 

function for the test score is given as 

f(x) - <*) jj e x (l - 6) l ~ x p(e) de 

- 6 E z k O JU + m, x + k) 
X k«0 k k 

5. 1 Numerical Example . Consider the case where the fitting via a 
4th degree Bernstein polynomial (m ■ A) yields the constants 

z 0 * 1,0 » z l " 1,5 » z 2 " 2 * 0, Z 3 " 0 and z 4 " * 5, For a test with 
I = 10 items, the projected density is .10406, .11372, .11888, 

.11905, .11422, .10489, .09207, .07726, .06244 and ,05012 at the 

test scores of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. 
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ADEQUACY OF ASYMPTOTIC NORMAL THEORY IN ESTIMATING RELIABILITY 
FOR MASTERY TESTS BASED ON THE BETA-BINOMIAL MODEL 

Huynh Huynh 
University of South Carolina 



ABSTRACT 

Simulated data based on five test score distributions indicate 
that a slight modification of the asymptotic normal theory for the 
estimation of the p and kappa indices in mastery testing will pro- 
vide results which are in close agreement with those based on small 
samples. The modification is achieved through the multiplication 

of the asymptotic standard errors of estimate by the constant 
3/4 

1+m where m is the sample size. 



1. INTRODUCTION 

A primary purpose of mastery testing is to classify examinees 
in several achievement (or ability) categories. Typically, there 
are two such categories, mastery and noiunastery. The reliability 
of mastery tests is often viewed as the consistency of the various 
classifications across two test administrations; this consistency 
may be quantified via 'he raw agreement index (p) or the kappa 

This paper has been distributed separately as RM 80-2, July, 1980. 




217 



2i)9 



HUYNH 

index (k). The raw agreement index is simply the combined pro- 
portion of examinees classified consistently as masters or non- 
masters (if there are only two categories) on the two test ad- 
ministrations. The kappa index, on the other hand, expresses the 
extent to which the test scores improve the consistency of de- 
cisions beyond what would be expected by chance. Details regard- 
ing the nature and use of these indices may be found in 
Swaminathan, Hambleton, and Algina (1974), Huynh 0976, 1978a), 
and Subkoviak (1976, 1980). 

Although p and k are defined in terms of repeated testing, 
practical considerations often necessitate their estimation on the 
basis of test data collected from a single test administration. 
This may be done, for example, via the beta-binomial model (Huynh, 
1976, 1979). The data reported in Subkoviak (1978), and by Huynh 
and Saunders (in press) tend to indicate that the beta-binomial 
model yields reasonably accurate estimates for p and < in situa- 
tions involving educational tests such as the Scholastic Aptitude 
Test and the Comprehensive Tests of Basic Skills. 

The beta-binomial model also provides a convenient way to 
study the asymptotic sampling characteristics of the et^imates. 
Let p and k denote the (moment or maximum likelihood) estimates 
for p and k, and let m be the number of examinees. Then /m (p - p) 
and /m (k - k) follow asymptotically two normal distributions, each 
with a mean of zero and a standard deviation of G(p) or CU) (Huynh, 
1978b, 1979). The constants G(p) and G(k) depend only on the 
number of items (n), the mean (y) and standard deviation (o) of the 
test scores, and the cutoff score (c). They are not functions of 
the sample size m, and may be computed via formulae, tables, or 
computer program (Huynh, 1978b, 1979). 

The asymptotic considerations just summarized indicate that 

A A 

the estimates p and k follow approximately normal distributions 
with means of zero and standard deviations of o oo (p) ■ G(p)/i/m and 
o 00 = G(k)/^ when the sample size m is sufficiently large. 
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The extent to which these "asymptotic" standard errors reveal 
adequately the corresponding values in small samples appears to be 
unknown. Further, if 3 Jp) and s^k) represent the asymptotic 
standard errors computed from the sample data, asymptotic theory 
holds that the sampling distributions of the two ratios, z(p) « 
(p - p)/3 m (p) and z(k) « (k - O/s^k), are approximately normal 
distributions with zero means and unit variances. The degree with 
which this asymptotic normality is true for small samples has yet 
to be investigated. 

The purpose of this paper is threefold. It will first assess 
the adequacy of using the asymptotic standard errors to approximate 
the actual values encountered in small samples. Then, it will look 
at the degree to which asymptotic normal distributions can be used 
to describe the actual sampling distributions of the ratios z(p) 
and z(k) when small samples are used. Finally, the paper also 
suggests a slight adjustment to the results of the asymptotic 
theory so that they will resemble more closely the results associ- 
ated with small or moderate samples. 

2. PROCEDURES 

Let <> m (p)> and o^(k) be the actual standard errors associate 
with a sample of size m. The closeness of the asymptotic approxi- 
mations to these actual standard errors, when small samples are 
employed, may be assessed by computing the relative errors of 

approximation: e(p) » [o(p) - o (p)]/o (i) and £ (k) = 
* * m 00 m 

^ 0 m^ " ° w (03/o (if), respectively. Approximations are said to 
be good when the ratios, e(p) and e(ic), are close to zero. In 
most practical situations, a ratio falling between ±5% should pro- 
bably be considered as evidence of acceptable approximation* 

As stated in the introduction, the asymptotic standard errors 
°a>(p) and ° 00 ( ,c )j may be computed for a given test score distribu- 
tion. Since no simple formulae appeared available for the compu- 

tation of the small sample standard errors o (p) and o (k), com- 

m m 

puter simulation with 5000 replications was used in order to 



erJc 



219 J I 



HUYNH 



estimate their values as well as the relative errors of approxi- 

A A 

mation e(p) and e(ic). 

Computer simulation with 5000 replications was also used to 
assess the adequacy of using the unit normal distribution to de- 

A A 

scribe the sampling distributions of the ratios z(p) and z(k). 
The proportions of the simulated z-ratios which fell within 
selected (two-sided) critical values were computed and compared 
with the corresponding values expected from a normal distribu- 
tion. The extent to which the proportions from the computer 
simulated distributions resembled the corresponding normal dis- 
tribution probabilities was used to assess the adequacy of the 
asymptotic normal distribution. For this study, (two-sided) 
critical values were selected so that the central portion of the 
unit normal distribution was covered corresponding to probabilities 
of 80%, 90%, 95%, and 99%. 

Both the moment and maximum likelihood (ML) estimates were 
used in this study. Moment estimates exist when the sample reli- 
ability index, KR21, is positive. When this was not the case, it 
was then assumed (as in Wilcox, 1977) that the beta-binomial model 
degenerated to a binomial distribution with an estimated success 
probability of X = x/n where x is the test mean. Under these con- 
ditions, the estimate for k was taken as zero, and that for p was 
computed via the expression p ■ p + (1 - P Q ) where 

P 0 - I (£)X X (l-X) n ~ X . 
x«o 

In addition, following the intuitive reasoning that degenerate 

A A 

cases only represent extreme situations, both the z(p) and z(k) 
ratios were taken as extremely large whenever the degenerate case 
occurred. 

Although the moment estimates are considerably easier to com- 
pute than the corresponding ML estimates, ML estimates often have 
been considered better than the moment estimates. (The asymptotic 
sampling distributions of the moment and ML estimates are the same 
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however.) Because of this, the comparisons previously described 
for the moment estimates were also made for ML estimates. The ML 
estimates were obtained via a Newton-Rapbson iteration scheme de- 
scribed elsewhere (Huynh, 1977). In the rare instances where the 
ML iteration did not converge, the moment estimates were used.) 

The data base for this study consisted of five beta-binomial 
distributions. Four tests consisting of n ■ 5, 10, 15, and 20 
items each were assembled by random selection of items from the 
Comprehensive Tests of Basic Skills, Form S, Level 1, which had 
been c<sed in the South Carolina 1978 Statewide Testing Program. 
The actual frequency distribution for each of these tests was 
altered slightly so that the resulting distribution would conform 
almost exactly to a (marginal) beta-binomial distribution. 
Another beta-binomial distribution, with a - 8.970 and 0 - 1.994, 
was patterned after the one used in the Wilcox (1977) study. 
Details regarding these distributions and the selected cutoff 
scores c may be found in Table 1. For each case listed in this 
table, five thousand replications were simulated to estimate 
various standard errors and sampling distributions. The sample 
size m was selected to be 25, 50, 100, 200, and 400. 

TABLE 1 

Descriptions of the Five Tests used in the Simulation 



Case 


Source 


n 


Mean 


SD 


a 




KR21 


c 


1 


CTBS 


5 


3.7066 


1-5445 


1.2512 


0.4367 


.7476 


3 


2 


CTBS 


10 


7.4702 


2.9435 


1.1285 


0.3822 


.8688 


6 


3 


Wilcox 


10 


8.1814 


1.6147 


8.9703 


1.9940 


.4770 


8 


4 


CTBS 


15 


8.8630 


3.3588 


3.3273 


2.3039 


.7271 


9 


5 


CTBS 


20 


11.1811 


5.1115 


1.9115 


1.5077 


.8540 


12 



Preliminary simulations indicated that the asymptote stan- 
dard errors tended to underestimate the smaller sample standard 

errors, and that an adjustment via the multiplicative constant, 
3/4 

h s 1 + t would substantially improve the adequacy of the 
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results deduced from the asymptotic theory. Hence, adjusted 
asymptotic standard errors of the form o* = o (1 + l/m 3 ^S and 
adjusted z ratios of the type z = z/(l + 1/in /q ) were also in- 
corporated in the study. 

3, RESULTS 

Table 2 reports the relative errors of approximation, e(p) 
and g(k), for the asymptotic standard errors of the moment and ML 
estimates. Values associated with the adjusted asymptotic stan- 
dard errors are enclosed within parentheses. The table reveals 
the following points, (a) The unadjusted asymptotic standard 
errors for both p and < are slightly closer to the finite-sample 
standard errors of the ML estimates than to those associated with 
the moment estimates. This result does not appear unexpected: 
Strictly speaking, asymptotic theory deals mainly vith ML esti- 
mates which are asymptotically efficient (i.e., unbiased with 
minimum variance). The asymptotic results, however, may be 
applied to the lesj efficient moment estimates because these are 
asymptotically equivalent to the ML estimates. Hence, the 
asymptotic standard error should more accurately depict the 
sampling variability of the ML than those of the moment ritimates. 
However, the difference in accuracy is minimal when sample sizes 
as small as 25 or 50 are used, (b) The unadjusted asymptotic 
standard errors underrepresent the corresponding finite-sample 
standard errors; the extent of underrepresentation is less for 
°«(p) than for °oo( K )' A 8 seen in the last four rows of Table 2, 
the absolute relative errors of approximation e(p) average 8.3, 
4.9, 3.3, 2.9, and 3.0 percent for sample sizes of 25, 50, 100, 
200, and 400, respectively. For k, these percentages are 13.8, 
7.6, 4.6, 4.0, and 2.9%. (c) As mentioned in the last section, 

the multiplicative adjustment via the constant 1 + 1/m 3 ^ pro- 

it 

duced adjusted asymptotic standard errors o which were substan- 

00 

tially closer to their finite-sample values o . For these 
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TABLE 2 



Relative approximation errors associated with the asymptotic 
standard errors and with the adjusted asymptotic standard errors 3 



Case Index Estimate Relative approximation error (in percent) 

at m = 



25 50 100 200 400 



p 


Moment 




A K 1 ^\ 
O • L \ 1 . L) 




ML 


8.2( 0.0) 


3.6(-1.6) 


K 


Moment 


13. If 5 W 






ML 


11. 8( 3.9) 


6.1( 1.1) 




MAm a •* ^ 

nomenu 




5.7( 0.7) 




ML 


4.4(-4.1) 


1.5(-3.8) 


K 


Moment 


20.4(13.3) 


10. 4( 5.6) 




ML 


17.8(10.4) 


7.6( 2.7) 


P 


Moment 


6.0(-2.4) 


4.0( 1.1) 




ML 


7.0(-1.3) 


3.7(-1.4) 


K 


Moment 


6.7(-1.7) 


6.8( 1.8) 




ML 


6.0(-2.4) 


5.7( 0.6) 


P 


Moment 


8.8( 0.0) 


4.3(-0.8) 




ML 


9.5( 1.4) 


4.2(-0.9) 


K 


Moment 


14. 9( 7.2) 


6.3( 1.3) 




ML 


15.7'. 8.2) 


6.2( 1.2) 


P 


Moment 


7.9(-0.3) 


4.3(-0.8) 




ML 


7.K-1.3) 


2.7(-2.5) 


K 


Moment 


13. 7( 6.0) 


6.6( 1.6) 




ML 


13. 3( 5.6) 


5.1( 0.0) 



1.9(-1.2) 1.9( 0.0) 1.4( 0.3) 
0.3(-2.9) 0.3(-1.6) -.2(-1.3) 

2.3(-0.8) 2.6( 0.7) 1.9( 0.8) 
0.9(-2.2) l.K-0.8) 0.3(-C8) 

5.7( 2.7) 5.9( 4.1) 5.9( 4.8) 
1.3(-1.9) 0.3(-1.6) 0.2(-0.9) 

6.2( 3.2) 4.7( 2.9) 3.6( 2.5) 
3.4( 0.3) 1.4(-0.5) O.O(-l.l) 

3.2( 0.1) 2.9( 1.1) 2.7( 1.6) 
1.8(-1.2) 1.0(-0.9) 0.3(-0.8) 

5.8( 2.8) 4.8( 3.0) 3.7( 2.7) 
4.3( 1.3) 2.5( 0.6) 1.2( 0.1) 

2.5(-0.6) 2.8(. 1.0) 2.4( 1.3) 
2.0(-l.l) 2.1( 0.2) 1.6( 0.5) 

4.3( 1.3) 3.6( 1.8) 2.6( 1.5) 
4.0( 1.0) 3.1( 1.2) 1.9( 0.8) 

3.2( 0.1) 3.7( 1.9) 2.4( 1.3) 
1.5(-1.7) 1.5(-0.4) O.l(-l.O) 

4.6( 1.6) 4.4( 2.6) 2.8( 1.7) 



Average of absolute 


error 






















p Moment 


8.3( 


1.3) 


4 


• 9( 


0.9) 


3.3( 


0.9) 


2.9( 


1,6) 


3.0( 


1.9) 


ML 


7.2( 


1.6) 


3 


• K 


2.0) 


1.4( 


1.8) 


1.0( 


0.9) 


0.5( 


0.9) 


k Moment 


13. 8( 


6.7) 


7 


• 6( 


2.7) 


4.6( 


1.9) 


4.0( 


2.2) 


2.9( 


1.8) 


ML 


12. 9( 


6.1) 


6 


• 1( 


1.1) 


3.K 


1.0) 


2.1( 


0.7) 


0.8C 


0.7) 



21 

Values in parentheses represent relative errors of approximation when 
the adjustment h is used. 
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adjusted asymptotic standard errors, the absolute relative errors 

A 

of approximation of p average 1.3, 0.9, 0.9, 1.6 and 1.9 percent 

A 

for m ■ 25, 50, 100, 200, nd 400, respectively. As for ic, these 
average absolute relative errors stand at 6.7, 2.7, 1.9, 2.2, and 
1.8%. (d) As expected, the asymptotic standard errors resemble 
more closely those estimated for finite samples as the sample size 
m becomes larger. Sampling errors associated with the simulation 
probably account for the erratic variation behavior of the esti- 
mated finite-sample standard errors found at a few places in Table 
2. 

Table 3 reports the empirical percentages of simulated z and 
z* values which fall around zero with a nominal normal probability 
of 80Z, 90Z, 95%, and 992 (The results are reported only for the 
moment estimates, which differ only slightly from those associated 
with the ML estimates.) Two major points may be inferred from the 
reported data, (a) The use of unadjusted asymptotic standard 
errors produces z ratios which show less concentration around 0 
than that predicted from a unit normal distribution. This is con- 
sistent with the results previously reported regarding the under- 
approximution associated with the unadjusted asymptotic standard 
errors. This under approximation produces z ratios with a stan- 
dard deviation slightly larger than one; hence the corresponding 
distribution for these z ratios would show less probability around 

the central value of zero than that of a unit normal distribution. 

3/4 * 
(b) Adjustment via the factor 1 + 1/m results in adjusted z 

ratios which cluster around zero with (empirical) probabilities 

very close to the nominal values predicted from the asymptotic 

normal theory. The degree of similarity between the empirical 

and nominal probabilities is quite adequate even with samples of 

size m ■ 25. The empirical and nominal probabilities are, within 

sampling error, nearly identical when the sample size is larger, 

say when m is 50 or higher. 
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TABLE 3 



Empirical percentages of unadjusted (and adjusted) z(p) values 
which fall around zero with selected nominal probabilities 



Nom- 
inal 
Prob. 
Case (X) 



Empirical percentage at m ■ 



25 



50 



100 



200 



AOO 



80 75.1(79.6) 

90 86.4(89.5) 

95 92.0(94.0) 

99 97.4(98.1) 



77.0(79.4) 79.1(80.1) 78.8(79.7) 

87.0(88.8) 88.8(89.9) 89.2(89.8) 

92.9(94.6) 94.7(95.3) 94.3(94.8) 

98.1(98.7) 98.8(99.0) 98.7(98.9) 




80 74.7(78.6) 

90 85.4(88.5) 

95 91.3(93.1) 

99 96.2(97.3) 

80 75.7(79.8) 

90 85.4(87.6) 

95 89.7(91.0) 

99 93.8(94.5) 

80 77.4(81.3) 

90 87. 9(90. 7> 

v95 93.3(95/4) 

?9 98.0(^.8) 

8Q 75\8(79.9) 
90^86.6(89.7) 

95 92.3(94.7) 

99 98.0(98.7) 



75.9(78.7) 76.3(78.0) 77.2(78.0) 

86.6(88.5) 87.2(88.6) 87.7(88.4) 

92.2(93.4) 92.9(93,6) 93.2(93.7) 

97.7(98.0) 98.0(98.2) 98.2(98.4) 

78.1(80.6) 79.2(80.6) 78.8(79.6) 

89.0(90.6) 89.4(90.6) 89.2(89.7) 

93.5(94.7) 94.5(95.3) 94.6(95.0) 

97.8(98.2) 98.5(98.8) 98.7(98.8) 

78.5(81.^L_Z«'.6(86^)) 78.6(79.6) 
89.2(90. 0> 88.9(89.5) 
93.8(95.4) 94.1(94.9)^.4(94.8) 
98.7(99.0) 98.5(98.8) 98V5(98.7) 

78.0(80.1) 78.3(80.0) 78.7(79.^6) 

88.2(89.9) 88.6(89.6) 88.5(89.1) 

93.7(94.7) 94.2(95.0) 93.7(94.3) 

98.3(98.8) 93.7(89.9) 98.5(98.6) 



78.9(79.4) 
89.3(89.8) 
95.0(95.3) 
78.9(99.0) 

77.0(77.5) 
87.8(88.3) 
93.8(94.1) 
98.3(98.3) 

78.7(79.3) 
88.7(89.2) 
94.4(94.6) 
98.7(98.8) 

79.3(79.9) 
89.1(89.5) 
94.4(94.6) 
98.7(98.7) 

19.1(79.7) 
89,3(89.6) 
94.5(94.7) 
98.7(98.8) 



4. SUMMARY AND CONCLUSION 

The study indicates that the asymptotic normal theory for the 
estimation of p and k via the estimates p and k produces asymptotic 
standard errors which are slightly smaller th*n the actual standard 
errors associated with small samples. As a result, the sampling 
distribution of the z type ratios has fewer cases around zero than 
is predicted by a normal distribution. However, multiplication of 
the asymptotic standard errors by the constant 1 + 1/m 3 ^* results 
in adjusted asymptotic standard errors which show close agreement 
with the actual finite-sample standard errors, even with samples 
as small as 25 cases. In addition, the adjustment produces 2 
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ratios which follow very closely a normal distribution, at least 

with respect to the combined tail probabilities. This conclusion 

also holds for samples as small as 25 cases. 

All in all, it appears that, with the multiplicative adjust- 
3/4 

ment factor of 1 + 1/m imposed on the asymptotic standard 
errors, the asymptotic normal theory for the estimation of de- 
cision reliability in mastery testing (Huynh, 1978b, 1979) can be 
used safely with samples with as few as 25 cases. This con- 
clusion, of course, is restricted to situations similar to these 
considered here. 
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ABSTRACT 

In most reliability studies, the precision of a reliability 
estimate varies inversely with the number of examinees (sample 
size). Thus, to achieve a given level of accuracy, some minimum 
sample size is required. An approximation for this minimum size 
may be made if some reasonable assumptions regarding the mean and 
standard deviation of the test score distribution can be made. 
To facilitate the computations, tables are developed based on the 
Comprehensive Tests of Basic Skills. The tables may be used for 
tests ranging in length from five to thirty items, with percent 
cutoff scores of 60%, 70%, or 80%, and with examinee populations 
for which the test difficulty can be described as low, moderate, 
or high, and the test variability as low or moderate. The tables 
also reveal that for a given degree of accuracy, an estimate of 
kappa would require a considerably greater number of examinees 
than would an estimate of the raw agreement index. 

This paper has been distributed separately as RM 80-3, March, 1980. 
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1. INTRODUCTION 

In many applications of educational and psychological testing, 
an empirical demonstration of the reliability of the measuring in- 
strument is desirable. Such demonstration is most meaningful when 
the estimate for the reliability has been obtained with a reason- 
able degree of accuracy. That is, the standard error of estimate 
must be within some acceptable limit. In most instances, the 
standard error is a decreasing function of the number of examinees 
(sample size) to be included in the reliability study. Thus, some 
minimum sample size is needed to achieve a given level of precision. 
The purpose of this paper is to illustrate how this sample size can 
be assessed in estimating the reliability of mastery tests. 

The paper consists of three major parts. The first part pre- 
sents an overview of the procedures for estimating two reliability 
indices for mastery tests by using data collected from one test ad- 
ministration. The use of the estimation process to determine the 
minimum sample size is illustrated in the second part. Finally, a 
set of tables is developed to facilitate the determination of the 
minimum sample size in reliability studies for mastery tests. 

2. OVERVIEW OF SINGLE-ADMINISTRATION 
ESTIMATES FOR RELIABILITY 

Mastery tests are commonly used to classify examinees into two 
achievement categories, usually referred to as mastery and non- 
mastery. The reliability of such tests is often viewed as the con- 
i t<*ncy of mastery-nonmastery decisions. It may be quantified via 
the raw agreement index (p) or the kappa index (k). The p index is 
simply the combined proportion of examinees classified consistently 
as masters or nonmasters by two repeated testings using the same 
form or two equivalent forms of a mastery test. The kappa index, 
on the other hand, takes into account the level of decision con- 
sistency which would result from random category assignment. It 
expresses the extent to which the test scores improve the con- 
sistency of decisions beyond the chance level. 




MINIMUM SAMPLE SIZE 

Though both p and < are defined in terms of repeated testings, 
there are many practical situations in which they may be estimated 
from the scores collected from a single test administration (Huynh, 
1976). The estimation process assumes that the test scores con- 
form to a beta-binomial (negative hypergeometric) model, and may be 
carried out via formulae, tables, and a computer program reported 
elsewhere (Huynh, 1978; 1979). The data reported by Subkoviak 
(1978) and by Huynh and Saunders (1979) tend to indicate that the 
beta-binomial model yields reasonably accurate estimates for p and 
k in situations involving educational tests such as the Scholastic 
Aptitude Test and the Comprehensive Test of Basic Skills. 

The beta-binomial model also provides asymptotic (large sample) 
standard errors for the estimates. Simulation studies indicate that 
the asymptotic standard errors tend to underestimate the actual 
standard errors when the sample size is small (Huynh, 1980). The 
degree of underestimation is not substantial when the sample has 
sixty or more examinees. Since the beta-binomial model will be 
used throughout the remaining part of this paper, a minimum sample 
size of sixty examinees will be assumed to hold uniformly for all 
cases under consideration. 

3. ILLUSTRATIONS FOR SAMPLE SIZE 
DETERMINATION 

The standard error (s.e.) of estimates for p and for k are 
functions of sample size m. The quantity G « s.e. x ifiT is 
asymptotically (i.e., in large samples) a constant, however. This 
constant depends only on the number of items (n) , the mean (y) 
and standard deviation (a) of the test scores f and the cutoff score 
(c). Given the availability of these parameters, the value of G 
may be determined via the tables or the computer program presented 
elsewhere (Huynh, 1978). Once G is determined, a minimum sample 
size m can be calculated which will restrict the standard error of 
estimate to whatever tolerable range is required. 

Suppose, for example, that an estimate of < is needed for a 
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short (n ■ 6 items) test to be used with a particular population of 
students. Passing or mastery on the test is to be granted if an 
examinee attains a score of 5 or 6. Further, suppose that we want 
the standard error of this estimate to be smaller than 10% of k, 
that is, s.e. (k) <_ .10k. 

What sample size would be needed to obtain the specified 
degree of accuracy in the estimate? To answer this question using 
the above mentioned Huynh procedure, a preliminary knowledge of 
the test mean and standard deviation is needed. Suppose past data 
suggest that the students are generally well-prepared on the con- 
tent of the test in question and can be expected to be fairly 
homogeneous in achievement. We might suppose that in the population 
the mean will be 5.0 and the standard deviation will be 1.2. Using 
these values, and the cutoff score of 5, a value of G can be read 
from the tables (or computed): G(k) - .7390. If the population 
mean and standard deviation are as given, then, assuming the beta- 
binomial model, the population value of k is .3778. These results 
are then used to estimate the sample size needed to bring the 
standard error of estimate with the desired limits (i.e. less than 
.10k). 

Since the standard error of estimate is approximately G/v'ST, 
the standard error must be such that 

.10k 

or, equivalent ly, 

m > TG(k)/. 10kJ 2 . 

For this example, then, 

m > | .7390/(.10)(.3778)] 2 = 382.62. 

Thus, to have no more than 10% relative error requires that at 
lease 383 examinees be tested to estimate k. 

A similar computation can be made for s.e. (p) <_ .10p when the 
above assumed population values hold. Thus, using the tables, 
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G(p) « .3210, 
p = .7532, 

and 

m > [G(p)/.10 P 3 2 * 18.16. 

Because of the previously mentioned problems of underestimation In 
small samples, a sample size of at least sixty Is recommended re- 
gardless of the above computation. 

It might be disheartening to note that a much larger sample 
size is needed to keep the standard error of the k estimate within 
the desired limits than is required when an estimate of p is used. 
However, the standard error for k is much larger than that of p 
(Huynh, 1978). Thus, for the same relative size of errors of es- 
timation, larger samples are needed to estimate k than to estimate 
p. It could be argued that the same degree of accuracy of esti- 
mation is not required. If so, then a less accurate estimate of k 
would allow a smaller sample size. 

The above illustration presumes that the mean and standard de- 
viation of the test scores can be projected prior to the real test 
administration. In a number of instances involving the use of 
standardized tests for a heterogeneous group of students, reasonable 
assumptions may be made, which will yield projected values for both 
y and o. For example, when an n-item multiple-choice is built to 
maximize the discrimination among individual examinees, it is not 
unreasonable to assume that the test mean is half way between the 
expected chance score and the maximum score n, ami that the stand- 
ard deviation is about one-sixth of the test score range from 0 
to n. (If there are A options per item, the expected chance score 
is n/A.) In other words, it is not unreasonable to presume that 

y - (n+n/A)/2 

and 1, 

a « n/6. 

For example, consider a test consisting of 10 four-option items. 
Then A * A, and the projected mean and standard deviation are 
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y ■ 6.25 and a * 1.66667. Presuming a cutoff score of c = 6, it 
may be found that p - .6140, G(p) - .3661, k - .1118, and G(k) - 
.8213. If a relative error of 5% is acceptable for p, then a 
sample of at least [ .3661/(.05x.61A0)] 2 - 143 students would be 
needed. On the other hand, a relative error of 25% for kappa 
would require [ .8213/(.25x.lU8)3 2 - 864 students. 

4. PRACTICAL CONSIDERATIONS IN SETTING SAMPLE 
SIZE IN BASIC SKILLS TESTING 

Some general formulae are given for expressing the relation- 
ships among s.e., G, m, p, k, and the proportion of sampling error 
desired in an estimate. These general expressions will then be 
used in a series of simulations designed to explore their typical 
numerical values for real tests. Tables are developed to help the 
practitioner decide on the sample size needed to obtain estimates 
of p and k for various degrees of precision. 

General expressions 

Since G - s.e. x y^f" is a constant for large samples, this ex- 
pression forms the basis for tha formulations in this section. In 
the previous section .10 and .05 were used as examples of desired 
degrees of precision for a sample estimate of p. In general, we 
will call this quantity y $ using y p and Y K to distinguish precisions 
desired for p and k, respectively. Thus, the general expressions 
for minimum sample size are: 

- [v 

and 

i2 



m > 



G(k) 

Y K 
K 



A further simplification is to let R(p) - [G(p)/p] 2 and 
= CG 
m, become 



2 

R(k) = [G(p)/k] . The above expressions for minimum sample size, 
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* 1 R(p)/(Y p ) 

and 



2 



m > R(k)/( Y|c ) 2 . 

These expressions will allow minimum sample size to be determined 
from knowledge of two quantities, R and y. 

Determin ing typical values of R(p) and R(k) 

In practical applications, the values R(p) and R(k) depend on 
a test score distribution which is not yet available. So, as in the 
previous section, conjectures must be made regarding the mean and 
standard deviation of the test scOre in order to project the minimum 
sample size. 

In this section, typical values for R(p) and R(k) will be re- 
ported for practical testing situations involving the assessment of 
basic skills. Several combination of test length, difficulty, 
variability, and cutoff scores will be us*»d. To arrive at the 
values of R(p) and R(k) reported in Tables 1-3, the following series 
of steps was taken. 

First, a series of subtests was developed, using items found 
in the Comprehensive Test of Basic Skills (CTBS), Form S, Level 1. 
The items composing each subtest were randomly selected from one of 
five CTBS content areas, to reflect a variety of subjects and 
skills. For each content area, subtests were constructed with 5, 
10, 15, 20, 25, and 30 items, producing a total of 30 subtests. 

Second, the administration of the subtests was simulated 
using actual student responses. Data for the simulation came from 
5,543 students, comprising a systematic sample (every tenth case) 
of the third grade students tested using Level 1 of the CTBS by 
the 1978 South Carolina Statewide Testing Program. From the 
students' responses to each item in the CTBS, raw scores were gen- 
erated for each student on all 30 subtests. 

Third, values of the mean and standard devLation of raw scores 
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on each test were obtained. District means and standard deviations 
were calculated for each school district with AO or more students 
in the sample. For each of the 30 subtests, means and standard 
deviations were plotted in a bivariate scatter diagram. The 
scatter-plots were divided into areas representing different cate- 
gories of test difficulty and variability. Then districts were 
selected with means and standrrd deviations considered to be typical 
of six categories of difficulty and variability. These six cate- 
gories (tests of low, moderate, and high difficulty, with low and 
moderate variability) were chosen to represent types of test score 
distributions typically encountered in mastery testing. 

Fourth, the typical values obtained in the previous step were 
used to determine R(p) and R(k) . For each of the 30 subtests, the 
computer program described elsewhere (Huynh, 1978) was used to 
obtain estimates of G(p), p, G(k) , and k when the cutoff scores 
were equivalent to 60%, 70%, and 80%. These data were used to 
calculate R(p) and R(k) in each case. 

Finally, the values of R(p) and R(k) obtained above were 
averaged over the five CTBS content areas and the resulting values 
were compiled in tabular form. Tables 1, 2, and 3 provide values 
of R(p) and R(k) for percent cutoff scores of 60%, 70%, and 80%, 
respectively. 

The data needed to enter the tables are: (1) test length 
(n), (2) an idea of test difficulty (high, moderate, or low), (3) 
test variability (low or moderate), and (4) percentage cutoff 

score (60%, 70%, or 80%). The minimum sample size needed *s simply 

2 

R/y > that is, the value of R obtained from the tables divided by 
the square of the acceptable proportion of sampling error in the 
estimate. 

Numerical example 

Suppose a study is planned to assess the reliability of a 
twenty-item test (n * 20) using the kappa index when a cutoff score 
of 14 (c ■ 70%) is employed. The students for whom the test is 
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TABLE 1 

Values of R for p and k for Six Categories of 
Tests at the Percent Cutoff Score of 60% 



Test Category 
(dlff) (var) 




5 


10 


Number or 
15 


Items 
20 


25 


30 


High 


Low 


(P) 


0.219 


0.075 


0.050 


0.031 


0.023 


0.018 






(•0 


5.349 


1.623 


0.666 


0.391 


0.307 


0.209 


High 


Mod 


(?) 


0.164 


0.061 


0.D36 


0.025 


0.018 


0.014 






(■0 


2.589 


0.908 


0.327 


0.280 


0.209 


0.139 


Mod 


Low 


(P) 


0.244 


0.085 


0.056 


0.032 


0.025 


0.020 






(•0 


5.809 


1.485 


0.613 


0.367 


0.269 


0.200 


Mod 


Mod 


(P) 


0.148 


0.068 


0.036 


0.027 


0.021 


0.015 






(■0 


2.215 


0.838 


0.312 


0.266 


0.198 


0.126 


Low 


Low 


(P) 


0.199 


C.095 


0.044 


0.031 


0.025 


0.020 






(■0 


5.502 


1.345 


0.560 


0.365 


0.247 


0.186 


Low 


Mod 


(P) 


0.142 


0.068 


0.032 


0.024 


0.020 


0.016 






(■0 


2.371 


0.770 


0.298 


0.249 


0.176 


0.128 



intended are known to be a homogeneous group of relatively high 
ability. Thus, it might be expected that the test would be of low 
difficulty (i.e., easy), with low variability. Let us say that a 
fairly precise estimate of ic is desired, so Y K is set at .05. 
Entering Table 2, in the row corresponding to low difficulty and 
low variability, it if found that R(k) for n ■ 20 items is .362. 
The minimum sample size needed to estimate kappa with 5% allowable 
error is then computed as m - R(ic)/y 2 - .362/(.05) 2 • 144.8. 
Thus, a sample of at least 145 students is necessary to achieve the 
desired degree of precision, if reliability is to be determined 
via the raw agreement index p, a similar procedure is followed 
using R(p) and y • Again, at least 60 students should be used in 
the sample, e* v n if it is found that m < 60. 
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TABLE 2 

Values of R for p and k for Six Categories of 
Tests at the Percent Cutoff Score of 70% 



Test Category 
(diff) (var) 




5 


Number of 
10 15 


Items 
20 


25 


30 


High 


Low 


(P) 
M 


0.219 
5.349 


0.075 
1.623 


0.046 
0.776 


0.029 
0.455 


0.022 
0.410 


0.017 
0.272 


High 


Mod 


(p: 

(k) 


0.164 
2.589 


0.061 
0.908 


0.033 
0.360 


0.023 
0.324 


0.017 
0.276 


0.013 
0.178 


Mod 


J JW 


(P) 
(k) 


0.244 
5.809 


0.085 
1.485 


0.053 
0.646 


0.031 
0.396 


0.023 
0.322 


0.019 
0.242 


Mod 


Mod 


(P) 
(k) 


0.148 
2.215 


0.068 
0.838 


0.035 
0.321 


0.026 
0.289 


0.019 
0.237 


0.014 
0.149 


Low 


Low 


(P) 
(k) 


0.199 

5.502 


0.095 
1.345 


0.050 
0.512 


0.031 
0.362 


0.024 
0.265 


0.019 
0.203 


Low 


Mod 


(P) 
(k) 


0.142 
2.371 


0.068 
0.770 


0.036 
0.280 


0.023 
0.254 


0.019 
0.190 


0.015 
0.137 



Some observations on the tabled values 



In every case R(k) > R(p) . This fact implies that the sample 
size necessary to estimate kappa will be larger than that needed to 
estimate p, for any fixed degree of precision, y. As noted previous- 
ly, practical limitations may require that larger proportions of 
error be tolerated when estimating kappa than when estimating p. 

R-values for the case of low variability are larger than those 
for moderate variability . If there is doubt <*bout the expected 
degree of variability, the value of R for the low variability case 
would produce the more conservative estimate of m. 

R decreases as the number of test items increases . The re- 
lationship between R and n is not linear, however. Hence, linear 
interpolation would not be appropriate for determining R for non- 
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TABLE 3 

Values of R and p and k for Six Categories of 
Tests at the Percent Cutoff Score of 80Z 



Test Category 
(diff) (var) 



Number rf Items 
10 15 20 



25 



30 



High 


Low 


(P) 
M 


0.132 
7.076 


0.063 
2.805 


0.032 
1.494 


0.021 
1.055 


0.018 
0.887 


0.013 
0.660 


High 


Mod 


(P) 
(k) 


0.098 
3.510 


0.045 
1.678 


0.024 
0.608 


0.018 
0.717 


0.015 
0.568 


0.011 
0.404 


Mod 


Low 


(P) 

GO 


0.174 
6.831 


0.064 
2.283 


0.038 
1.087 


0.025 
0.812 


0.020 
0.640 


0.015 
0.558 


Mod 


Mod 


(P) 
GO 


0.113 
2.633 


0.047 
1.337 


0.026 
0.484 


0.021 
0.571 


0.017 
0.458 


0.012 
0.311 


Low 


Low 


(P) 

GO 


0.189 
5.849 


0.060 
1.906 


0.044 
0.652 


0.029 
0.611 


0.022 
0.471 


0.017 
0.417 


Low 


Mod 


(P) 

GO 


0.122 
2.675 


0.046 
1.113 


0.029 
0.348 


0.023 
0.430 


0.018 
0.325 


0.014 
0.248 



tabled values of n. The valu* of R listed for the largest tabled 
n less than the actual number of items should yield a conservative 
estimate for m. For example, suppose the test considered in the 
numerical example above actually contained 22 items. The tabled 
value of R corresponding to n = 25 would produce an underestimate 
of m, and the resulting proportion of error in estimating kappa 
would exceed y^. The R-value for n - 20 would overestimate m, and 
the observed proportion of error would then be less than y . 

The relations hips between R and test difficulty or cutoff scores 
are more complex . No simple trends can be observed in the tables. 
In many testing situations, the cutoff score typically ranges from 
60% to 80% correct. For cutoff scores falling between the values 
in the tables, find R for both bracketing values and use the larger. 
Again, consider the situation in the numerical example above. 
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Suppose the cutoff score was 13 (65% correct). From Tables 1 and 
2, the values of R corresponding to c « 60% and 70% are .365 and 
.362, respectively. The larger of these (corresponding to c » 60%) 
should provide a reasonable value for R. 

4. CONCLUSIONS 

In this paper, an approximation method has been presented for 
determining the minimum sample size necessary to achieve a speci- 
fied degree of precision in estimating raw agreement (p) and kappa 
(k) indices of reliability for mastery tests. Th* method uses the 
quantity R which can be calculated for known test score distri- 
butions. Tables of R have been constructed for test score dis- 
tributions typically found in mastery testing, for a variety of 
test lengths and cutoff scores. In addition, sugeestions have been 
made for obtaining reasonable estimates of R for situations not 
directly covered by the tables. 

Of course, precision is only one of the factors that must be 
considered in any study. Feasibility, cost, and classroom manage- 
ment considerations also play important roles. However, knowledge 
of necessary sample sizes should facilitate and simplify the 
planning of reliability studies. The tables presented here should 
be particularly useful for tests involving the basic skills, and 
perhaps other tests of similar construction. 
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STATISTICAL INFERENCE FOR FALSE POSITIVE AND 
FALSE NEGATIVE ERROR RATES IN MASTERY TESTING 
(COMPUTER PROGRAMS AND TABLES ADDED) 



Huynh Huynh 
University of South Carolina 



This paper describes an asymptotic inferential procedure *or 
the estimates of the r lse positive and false negative error rates. 
Formulae and tables are described for the computations of the stan- 
dard errors. A simulation study indicates that the asymptotic 
standard errors may be used even with samples of 25 cases as long 
as the Kuder-Richardson Formula 21 reliability is reasonably large. 
Otherwise, a large sample would be required. 



A primary purpose of mastery testing is to use test data in 
order to classity an examinee in one of several achievement (or 
ability) categories. Typically there are two such categories, 
mastery and nonmastery. For example, let 8 be the true ability of 
a person. Then true nonmastery status is defined by the condition 
6 < 6 Q and crue mastery by 0 > G q , G q being a given constant often 
referred to as a criterion level . In the reality of testing, 

This paper has been distributed separately as RM 79-6, July, 1979. 



Psychometrika , March 1980* 



ABSTRACT 



1. 



INTRODUCTION 



ERIC 



245 




HUYNH 



however, decisions are normally made on the basis of the observed 

test data. Let x be the test score and c an appropriately chosen 

passing (or mastery) score. Then nonmastery status is declared if 

x < c and mastery status is granted if x ^ c. A correct decision 

on the basis of test data is made when 6 < 6 and x < c or when 

o 

9 > 6 and x > c. The other two situations represent errors in 
— o — 

classification: a false positive orror is committed when 8 < 8 q 
and x _> c; a false negative error is encountered when 9 > 9 q and 
x < c. 

The likelihood (or rate) of false positive and false negative 
errors may be assessed via several schemes. For example, using the 
binomial error model and the notion of an indifference zone , it is 
possible to compute the maximum error rates in classification for 
an individual (Wilcox, 1976). On the other hand, the ertor rates 
for a group of examinees mi.y be assessed if a reasonable form for 
the (group) distribution of 9 is available. Such is the case of 
the beta-binomial model (Keats & ~ord, 1962) explored by Huynh 
(1976a, 1976b, 1977a, 1978) and Wilcox (1977) in several technical 
problems regarding mastery testing. 

The beta-binomial model requires that test items be exchange- 
able , i.e., they can replace each other without changinf "he 
distribution of test scores. Item exchangeability implies that the 
items are equally difficult. This condition can be considered only 
as approximately satisfied in most testing situations. However, 
there are indications (Keats & Lord, 1962; Duncan, 1974) that 
^veral test score distributions fit into the beta-binomial model 
adequately. There are more complex models (Lord, 1965, 1969) which 
take into account variation in item difficulty. However t as far as 
estimation of error rates is concerned, the data in Wilcox (1977) 
«eem to suggest that the more complex models do not increase sub 
stantially the accuracy of the estimates. 

The purpose of this paper is to describe an ac^ptotic infer- 
ential procedure for false positive and false negative error rates. 
The beta-binomial model is used as a vehicle for computation. 
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2. COMPUTATIONS FOR ERROR RATES 

Let n be the number of test items randomly selected from an 
Item pool, e (true ability) be the true proportion of Items In the 
total Item pool that would be answered correctly by an examinee, 
and x be the examinee's observed test score. Then the conditional 
density of x Is given as 

f(x|e) = (£)e x (i-e) n - x , x = 0,1 n . 

Let the density p of 6 be of the beta form with parameters a and B, 
I.e., 

A ot " 1 n_fl^~ 1 

p(a > - ifeS • ° ' 9 < 

Both a and 6 are positive constants. The joint density of (x,6) Is 
given as 

s<" e > = bt!« •"- 1 Ci-e>-*~ 1 . 

With the criterion level e Q and passing score c, the false positive 
error rate is given as 

F p 3 p ( x 1 c,e < e o ) 

= — E r n 1 f 9 ° o^-l/ ! A vn+fi-x-l , . 

B(a,6) 1 V >o 9 (1 9) de ' 
x=c 

Let 

•><u,v;e o ) « fl° t u " 1 (l-t) v - 1 dt . 

Then 

F P "I^bT J c 0 Ma*c,n+e-x;e o ). 

As for the likelihood F n of a false negative error, it may be 
noted that 

f = p(x < c-i,e > e ) 

n — — o 

i c* 1 1 , , 

■ 1 y / n \ ( .ct+x-1,.. rt vn4fi -x-1 

x=o *o 
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Let £ * 1-6, S 0 = i-"^* y = n-x, and d = n-c+1. Then it may be 
verified that 

F n B(a,B) y f d ( y> >o 5 

From this it may be seen that F n may be computed in exactly the 

same way as F . 

P 

The computations of F^ can be carried out with some degree of 
efficiency by noting that 

D(u+l,v-l; 0 o ) = (-eJJU-e^ 1 + uD(u,v;e o ))/(v-l) 

and that 

D(u,v;9 ) = B(u,v) I(u,v;0 ) . 
o o 

In this formula, I(u,v;e Q ) denotes the incomplete beta function as 

tabulated in Pearson (1934) and implemented via the IBM subroutine 

BDTR (1970) or the IMSL subroutine MDBETA (1977). 

3, ASYMPTOTIC STATIS^ . CAL INFERENCE FOR ESTIMATES 

Maximum likelihood estimation for a and 6 has been considered 
by several authors including Griffiths (1973). A fairly efficient 
computer routine is described in Huynh (1977b). The data generated 

by Huynh indicate that the maximum likelihood estimates a and 6 and 

a a 

the moment estimates a and 6 do not differ markedly from each other 
when the number m of examinees is reasonably large. Hence, for the 

a a 

numerical examples described in this paper, only a and 6 shall be 
used. They are to be computed as follows. Let x and s be the mean 
and standard deviation of the test score, and let 

a 21 n-1^ 1 2 > 

ns 

be the estimated KR21 reliability. Then the moment estimates are 
a * (-1 -» l/a 21 )x 

and 

A A A 

6 = -a + n/a 01 - n . 
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The estimates are positive when 0 < a 21 < 1. (If the computed 
value for o 21 Is zero or negative, replace It by the smallest posi- 
tive estimate of reliability which happens to be available.) 

For reasons previously mentioned, general sampling properties 
appropriate for the maximum likelihood estimates would be applicable 
to o and 0. For example, Vm"(a-a, 0-0) follows asymptotically a 
blvariate normal distribution with zero mean and covarlance matrix 



Co > 



ll b pq ll 1 where 



5 11 = * [ 



MM 



5 12 



x=o 
n 

= Z 
x=o 



and 



b 22 = 



n 
Z 
x=o 



da 



9f(x) 
da 



ills! 



/f (x) , 
9 f(x) 



80 



/f (x) , 



80 



/f(x) 



Now let F p = Z(a,0) be the function of (a,0) defining the 

false positive error rate. Let F « Z(a,0) be the estimate of F 

» » P p 
computed on the basis of (a,0). Then It may be deduced (Rao, 1973, 

p. 386-387) that »/m(F p - F p ) asymptotically follows a normal dis- 
tribution with zero mean and with variance 



V f P = a ll v 3o" 



3F , 3F 3F 3F „ 



It may then be said that the estimate F p has an approximate normal 
distribution with mean F p and standard deviation (standard error) 
° f ?<»* F p* * V fp''^' An estlffl a ted standard error for F , namely 

8 «/V» be obtained by replacing (a,0) by the estimates (a,0) 
* * 

in the above formula defining a (F ) . 

oo p 

The computations described abovt also apply to the rate of 

false negative error. Let F and F be the true and estimated 

n £ - 

values for this error rate. Then rtn(F n - F n ) asymptotically follows 
a normal distribution with zero mean and with variance 



'fn 



9F 9F 



8F 3F _ 
n . n. 2 

80 U 22^30 ' 



9 
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In addition, let p be the correlation between the estimated false 

positive and false negative error rates. Then it may be noted that 

p « cov(F ,F )/V. V- where 
p n rp rn 

. . 3F 3F 3F 3F 3F 3F 3F 3F 

cov(F pJ F n ) - o n j* ^ + o 12 (^ j + ^ + o 22 ^ J. 

4. COMPUTATIONS FOR THE PARTIAL DERIVATIVES 

The computation of V^, V^, and p requires the partial deriva- 
tives of Z(a, g) with respect to a and g. These derivatives, in 

turn, are based on the partial derivatives of D(cHoc,n+B-x: 9 ) and 

o 

B(a,$) with respect to a and £. 
4.1. Partial Derivatives of B(ct,g) 
With 

B(a,$) « J* t a " 1 (l-t) e ' 1 dt 
it may be deduced that 

35 - J 0 t (1-t) log t dt 
and that 

g./i t-^i-o 8 - 1 log (i-o dt. 

Let V be the Euler psi function as defined and tabled in Abramowitz 
and Stegun (1968, p. 258, Section 6.3 and Table 6.1). Then accord- 
ing to Gradshteyn and Ryzhik (1965, p. 538, Section 4.253, Formula 1) , 

H - B(a t B) >(a) - Y(a+$)) 

and 

H - B(a,B)(*(p) - 4>(a+$)J . 

Formulae are also available in these texts which are useful in 
computer programming the psi function. For the present paper, the 
following steps have been adopted. 

1. First the argument of ¥(*) is reduced to a value in the 
half closed interval Cl,2) by using the formula 
¥(z+l) - Y(z) + 1/z. 
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2. If z = 1, then - -.5772156649. 

3. For 1 < z _< 1.75, the following series expansion is used 

00 

ni+z) - f(l) + Z (-l) n C(n)z n_1 
n=2 

where £(.) is the Riemann zeta function tabulated in 
Abramowitz and Stegun (1968, p. 811, Table 23.3). If the 
series is stopped at the term z M , the error cannot exceed 
£(N)z < 1.21* , (N 4). For this paper, ten signifi- 
cant decimals are adopted for ¥. The value for N is 
-23. 21647129/log z + 1 which cannot exceed 82. 
>■.. For 1.75 < z < 2, the four-point Lagrange interpolation is 
used to compute ¥ on the basis of tabled values of ¥ for 
z = 1.745 (.005) 2.010. Let f^, f Q , and V 2 be four 
consecutive tabled values of V with f corresponding to z . 
Then for any p, 0 < p < 1, 

Hz + .005p) = -P^P-^^P-gly , (p 2 -D(p-2) 
o v ' 6 -12 Y 0 

- P^)(p-2)y , P(P 2 -D T 
2 Y l + 6 *2 

(Abramowitz and Stegun, p. 879, Section 25.2.13). Accord- 
ing to these authors (p. 270), this procedure yields ten 
significant decimals for the psi function. 

4.2. Partial Derivatives of D(ct-Hx,n+B-x; 6 J 

With 

Q 

D(crtoc,n+3-x;6 ) - / 0 t crt *" 1 (l-t) n+e " x - 1 dt , 

0 0 

it may be deduced that 

3D (• o ot+x-1, .n+0-x-l. t Jt 

and 

3D r o .ct+x-1,.. ...n+B-x-l, . , 

30 " J 0 * (1-t) P log (1-t) dt . 

With x > c > 1 and 0 < e Q < 1, the integrating functions for both 
partial derivatives are continuous with respect to t provided they 
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are tatcen as zero at t - 0. Hence, the process of differentiation 
under the integral sign is legitimate. Let 

G(u,v;e o ) = J® 0 t U - 1 (l-t) V - 1 log tdt, u > l, v > 0. 

Then 

3D 

-gj * G(ctfx,rH-6-x;e o ) . 

To compute the partial derivative 8D/86, let z - 1 - t In the 
previous integral defining this derivative. It follows that 

i£ = fl n+B-x-l n .ct+x-1, 

86 Jl-e z (1 " z) lo 8 2 dz 

o 

. fl n+6-x-l M .o+tt-1, 

e J Q z (1-z) log 2. dz - G(n+6-x,of+x;l-9 o ) . 

From Section 4.1, it may then be deduced that 
3D 

y g - = B(n+6-x,«+x) (y(n+6-x) - y(n+af6)) - G(n+B-x,cr+x;l-e ) . 

o 

The computation of G(u,v;e o ) is carried out as follows. 

1. For i < u < 2 and 0 < v < 2, the 32-print Gaussian-Hermite 
quadrature is used to integrate the function t u " 1 (i-t) v " 1 
log t on the interval (0,e o ), then on the two intervals 
(0,e Q /2) and (e o /2,e o ). If the relative change between 
the two resulting G integrals is less than a tolerance 
error EPS, then the numerical quadrature stc s. Otherwise, 
it will be carried out on the four subintervals (0,e M), 
(e o /4,e o /2), (e o /2.3e o /4), and (3e o /4,e o ) and the result- 
ing integral will be compared with the one obtained via 
two subintervals. The orocess continues until the rela- 
tive change between these integrals is less than EPS. The 
tolerance error EPS is set at .00005 in this paper. 

2. For other values of u and v, the following lemma is used 
to reduce u and v to two values u' and v' such that 

1 < u 1 < 2 and 0 < v 1 < 2. 
Lemma. We have 



and 



G(u,v-1;6 ) + G(u+l,v;6 ) = G(u,v;6 ) 
o o o 
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uG(u,v+l;6 ) - vG(u+l,v;6 ) - H 

O 0 

where 

H = e o (1 - e 0 )V ( (1 °8 9 0 - 1)/(u+v) ) -vD(u,v;8 o )/(u^). 
Proof. The proof for the first formula is as follows. 

e 

G(u+l,v;e ) » J °t u (l-t) v " 1 log t dt 



o 'o 



'O 

e 



dt 



-J 0 ° t^d-t^log t dt 



e 



+ ; 0 ° t-w^i* 



t dt 



- -G(u,v+l;8 o ) + G(u,v;6 o ) . 

As for * second formula, let us integrate in parts the 
integral 

Q 

G(u,v+i;6 o ) - / o ° t^d-t^log t dt. 

Let 

Y = t U - 1 (l-t) V 

and 

dZ - log t dt. 

Then 

dY = ((u-l)t U - 2 (l-t) v dt - vt u - 1 (l-t) V - 1 )dt 

and 



Z = t log t - t . 

Hence 

G(u,v+l;e ) - YZ 



t«o 



e 



o 

o 



e^(i-e o ) v (iog e o - i) 

- (u-l)/^° t u - 1 (l-t) v log t dt 
+ v/'° t u (l-t) v - 1 log t dt 
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+ (u-1) j ° t U - 1 (l-t) V dt 

'O 



- v/ o ° t U (l-t) V - 1 dt. 

Algebraic manipulations will yield 

G(u,v+l;9 o ) - -(u-l)G(u,v+l;6 o ) + vG(u+l,v; 9 q ) + H 

where H is defined in the lemma. The second formula of the 
lemma is just proved. 

The reduction of the range of u and/or v may now be 
accomplished by using the following recurrence formulae: 

G(u+l,v;6 o ) - (uG(u,v;6 o ) - H)/(u + v) 



and 



G(u,v+1;6 ) - (vG(u,v;6 ) + Hl/(u + v) . 
o x o * 



4.3. Partial Derivatives of F ^(q, ft) 
From the expression 

F p = B(tey J c OD(«^.rH-B-x;e o ) 

it follows that 
n 



3F 
da 



E (*)3D(orhc,n+B-x;e )/3ct - F 3B(a,B)/3a 
x=c p 



/B(a,6) 



x=c 



QG(a+x t . f6-x;e Q ) 



and 



/B(ct,B) - F (Y(o) - Y(ct+B)) 



3F n 



36 



E (") (B(ct-hc,n+B-x) - G(n+B-x,a+x,e ))/B(a,B) 



x=c 



The computations may be simplified by noting that 

D(o4xi4,n+B-x-i;e ) * f-e a4oc (i-e ) n+6 ^ x - 1 

o v o o 

+ (a-tac)D(a+x,n+B-x;e ))/(n+B-x-l) , 



and hence 
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G(a+x-i.i,n+0-x-l;9 ) - (-6^(1-6 ) n+e_x - 1 log 6 
u o o o 

+ D(o*x,n+p-x;9 ) 
o' 

+ (a+sc)G(a+x,itfe-x;9 o ))/(iH-e-x-l) . 

Also, 

GCn+e-x-i.a+x+ije ) » (e lrt " B " x "" 1 (i-e ) ot+oc iog e 

o v o o o 

- D(n+e-x-l,afx+l;9 o ) 

+ (ot+oc)G(n+e-x,orfx;e o ))/(n+e-x-l) . 

4.4. Partial Derivativ es of F (a,B) 

n ? f 

From the expression of F n in Section 2, namely 
y=d 

it follows that 



y«d 

Hence 



3F 
n 

36 

and 



n 

y=»> ' 0 



/B(a,ft) - F (<K0) - 4»(oH-B)) 



8F n 

"^T = * 0(B(e+y,n+a-y) - G(n+a-y, fi+y; 6 )) /B(a, ft) 

y=d 7 0 

- F n (<Ka) - f(a+e)] . 

5. NUMERICAL ILLUSTRAT ION 

Suppose that on a five-item test, the number of students 
having scores of 0, 1, 2, 3, 4, and 5 are 4, 14, 9, 17, 21, and 26 
respectively. Altogether there are m - 91 students. It follows 
that x = 3.264 and s - 1.562. The moment estimates for a and ft are 
a - 1.611 and ft - .857. The estimated covariance matrix of (a.fj) 
is defined by the elements a n - .18859, a 12 - .08318, and 
a 22 * • 0503: ' i - Let 9 0 " -80 and c - 4. The estimated error rates 
are then F p - .180 and F n - .031. The values of the partial 
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derivatives evaluated at (a,0) are 3F /3a - .02281, 3F /3B « .06926, 

# P P 

3F /3a - .01229, and 3F /3g - -.01464. Thus, the estimated standard 
n n 

errors for F and F^ are s (F ) - .025 and s (F ) - .003 respective- 
P n 00 P 00 .n' 

ly. The estimated correlation between F and F is p - .597. These 

P n 

data may be of use in estimating other parameters. For example, 

let y he the proportion of examinees classified correctly by the 

test scores. Then an estimate for y is v » 1 - (F + F ) - .789 

P n 

which is associated with an estimated standard error of s (y) * 

(%<v + %<v + 2 p 8 . ( ^p )s . ( v^ ■ ( ( - o25)2 + ( - oo3)2 m 

+ 2x.597x.025x.003]* 5 - .061. 

6. TABLES FOR F^V f ^F^V f n ^AND_ £ 

Tables are presented in Appendix A which facilitate the compu- 
tations for the false positive and false ti ive error rates, their 
standard errors of estimate, and their correction. As indicated 
previously, this information may serve as the basis for the compu- 
tation of statistics such as the proportion of correct decisions 
and its standard error. All computations were carried out via the 
Amdahl V-6 System with the double precision mode in use whenever 
feasible. 

Input to the tables are (i) number of test items n, (ii) cri- 
terion level 0 , (iii) passing score c, (iv) test mean x, and 
(v) the KR21 reliability a 21 . It may be noted that if a and B are 
estimates of the parameters a and 8 other than the moment estimates, 
then the entries for test mean and KR21 are simply na/(a + B) and 
n/(n + a + B) respectively. 

For each entry (n, 0 , c, x, cu,), five values may be read 

A A O Z-L ^ 

out. They are F , V. , F , V. , and p. 

p fp n fn 

The tables are constructed for n = 5(1)10 and c* 21 - .10(.10).90. 
For each n, the test mean is chosen such that x/n = .10(.10).90. 
The criterion level is set at 0 q = .60, .70, and .80, and the pass- 
ing scoie is one or two values approximately equal to or larger 
than n0 . 
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Numerical Example 1 

Let n = 10, e Q = .6, and c = 6. For x = 5.0 and ^ = .60, 
the tables yield the values F p = .1667, V = .1858, F n - .0504, 
V fn = .0548, and p = .2941. If the data are obtained from 100 
examinees, then the estimated standard errors are s (F ) - 
.If 58/10 - .0186 and m jij = .0055. It may be deduced that the 
proportion of correct decision is .7829 for which the standard 
error is estimated as .0241. 

It may be observed from these tables that the relationship of 
each of jrhe quantities F p , V fp , Vfn> and p with respect to 
either x or a n is rather unpredictable. Hence interpolation for 
nontabulated entries should be carried out with care since the 
relationship is obviously not linear. For such a case it is 
recommended that Lagrange interpolations with three or four points 
be used whenever possible. Details regarding interpolations of 
this type may be found in Abramowitz and Stegun (1968, Section 25.2) 
The four-point Lagrange interpolation has been described in 
Section 4.1. 

Numerical Example 2 

Let n = 10, e Q - .6, and c - 6, along with x = 4.0 and 
a 21 = .22. Using th four-point Lagrange interpolation for tne 
false positive error, we have * .1784, » = .1883, »j * .1886, 
and Y 2 = .1799. With p - (.22-.20)/.l - .2,°it may be found that 
the interpolated false positive error is .1891. 

7. FINITE-SAMPLE PERFORMANCE OF THE 
ASYMPTOTIC STANDARD ERRORS 

So far only an asymptotic treatment has been presented for 
the estimates of the false positive and false negative errc. rates 
P p and F n ' An obvlous question which needs to be answered is, at 
what minimum sample size m will the asymptotic standard errors 
S » ( V ' V fp /^ and bJTJ = V fn /^ ^present adequately the actual 
standard errors? A theoretical consideration of this issue is 
rather complex since it involver a joint examination of the spee. 
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at which W - *€T(a-a,B-e) converges to its asymptotic bivariate 

normal distribution and of the adequacy of representing the 

functions F p (a, p) and ^(a, p) by their Taylor expansions based on 

the first partial derivatives. Some work regarding the convergence 

speed of univariate maximum likelihood estimates are summarized in 

Kendall and Stuart (1967, Vol. 2, p. 46-48). An extension of this 

work would be needed for any theoretical consideration of the 

finite-sample behavior of the asymptotic errors. 

For this report, sinr lations employing the IMSL random genera- 

tor GGUB were used to assess the performance of s (F ) and s (F ) . 

00 P * 00 v n' A 

An additional issue under study was the degree of bias of F and F 

p n 

as estimates of the parameters F p and F^. (It may be recalled that 
both estimates are asymptotically unbiased.) 

Five beta-binomial distributions (summarized in Table 1) were 
used In the simulation study. Four tests consisting of n * 5, 10, 
15, and 20 items each were formed by random selection of items *roxa 
the Comprehensive Tests of Basic Skills, Form S, Level 1, which had 
been used in a large statewide testing program. The frequency dis- 
tribution for each of these tests was then altered slightly so that 
the resulting distribution would conform to almost exactly that of 
a (marginal) beta-bi ion al distribution. Relevant information 
regarding these distributions is listed in Table 1. The other 
beta-binomial distribution, with a • 8.970 and 0 « 1.994, is 
similar to the one used in the Wilcox study (1977). 

TABLE 1 

Descriptions of the Five Test ^ata u^ed in the Simulation 





Case Source 


n Mean 


SD 


a 


e 


a 21 


e 

o 


c 



1 


CTBS 


5 


3.7066 


1.5445 


1.2515 


0.4367 


.7476 


.5 


3 


2 


CTBS 


10 


7.4702 


2.9435 


1.1285 


0.3822 


.8688 


.6 


6 


3 


CTBS 


15 


8.8630 


3.. ^588 


3.3273 


2.3039 


.7271 


.8 


12 


4 


CTBS 


20 


11.1811 


5.1115 


1.9115 


1.5077 


.8540 


.6 


12 


5 


Wilcox 


10 


8.1814 


1.6147 


8.9703 


1.9940 


.4770 


.8 


8 
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The criterion levels 6 q were chosen to oe .5, .6, and .8 and 
the passing score c is nut at ne Q . The sample size m is set at 25, 
50, 100, 200, 400, and 800. 

For each situation listed in Table 1, two thousand r lications 
were used to estimate the means of F p and F^, and their finite- 
sample standard errors of estimate s(?) and s (F ) . The moment 

m P m n 

estimates were used when « 21 was positive. For a n negative or 
zero, the procedure used by Wilcox (1977, p. 295) was adopted. I n 
other words, for these situations, the beta-binomial is considered 
to have degenerated to a binomial distribution (n,X) where X = x/n. 
If X >_ 6 o , only false negative errors may be committed, for which 
the likelihood is 
c-1 

F n • E ( n )A x (l-X) n_x . 
x=o x 

When X < Q q , only false positive errors may occur with a rate of 

K - " C)A X (1-X) n " x . 
v x=c 

The moment estimates receive more attention than the ML estimates in 

this discussion because (i) they are likely to be used in practical 

situations, especially where computer facilities are not available, 

(ii) they are asymptoticall- equivalent to the maximum likelihood 

(ML) estimates, and (iii) iteration for ML estimates (which are the 

best asymptotically normal estimat'.s) is time consuming and may not 

converge in small samples. (See Zacks (l971, Section 5.2) for 

additional remarks a*~«t ML estimates.) However, simulations for 

the ML estimates were also conducted to provide comparative data 

for the ML and moment estimates. (In the rare instances where the 

ML iteration did not converge, the moment estimates were used.) 

Table 2 reports the empirical means of the estimates F and 
* p 
F n . Enclosed within parentheses are the empirical means based on 

the ML estimates. The data indicate that the means of the moment 

estimates and the corresponding means of the ML estimates are 

almost identical when m is at least 50. The degree of bias (as 

measured by the discrepancy between the empirical means and their 
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TABLE 2 

Empirical Means of the Estimates F- and F n 
(and of their Maximum Likelihood Counterparts) 



Pop. 




Empirical mean at m = 




Case Error Value 


25 


50 100 200 400 


800 



.040 .037 .038 .039 .040 .040 .040 

v (.037) (.039) (.039) (.040) (.040) (.040) 

F .061 .059 .060 .060 .060 .060 .061 

(.062) (.061) (.061) (.061) (.061) (.061) 

F .051 .049 .050 .051 .051 .051 .051 

V (.050) (.051) (.051) (.051) (.051) (.051) 
F .027 .027 .027 .027 .027 .027 .027 

(.028) (.028) (.027) (.027) (.027) (.027) 

F .120 .118 .119 .119 .119 .119 .119 

V (.120) (.119) (.120) (.119) (.120) (.120) 
F .024 .023 .024 .024 .024 .024 .024 

(.022) (.023) (.023) (.024) (.024) (.024) 

F p .078 .078 .078 .078 .078 .078 .078 



(.081) (.079) (.079) (.078) (.078) (.078) 

F .041 .041 .041 .041 .041 .041 .041 

(.042) (.042) (.042) (.041) (.041) (.041) 

F p .157 .151 .153 .156 .156 .157 .157 



(.149) (.154) (.157) (.156) (.157) (.157) 
F .072 .078 .076 .073 .073 .072 .072 
(.080) (.077) (.074) (.073) (.072) (.072) 

population values) appears noticeable only in some instances when 

m = 25. In practically all instances, the bias seems negligible. 

Table 3 reports the empirical values or vtas (F ) and *fn s (F ) 

m p m n 

alons with the corresponding values simulated for the ML estimates. 

The data indicate that for the situations und er study, the moment 

estimates and the ML estimates behave almost identically in terms 

of sampling variability. The data also show that the asymptotic 

values and tend to underestimate the finite-sample values 

iCs (F ) and v<n s (F ). The reader may deduce from the line 
m p m n * 

\ - .80 of Table II of Wilcox (1977) that s (F ) « .130xyIo 

,. A* P 

- .411 for m * 10> and » .072x/J0 « .394 for m * 30. The asymptotic 

value is .212. Thus the asymptotic standard error tends to be 

smaller than the actual error. The magnitude of underestimation 

is substantial when m is small and is moderate. (See Case 5 

260 



INFERENCE FOR ERROR RATES 



TABLE 3 



Empirical Values of ,/m ^(Fp) and ^ s„,(F n ) 
(and of their Maximum Likelihood Counterparts) 



Case Error Value 
1 F .052 



Asymp- 
totic Empirical values at m 



25 


50 


100 


200 


400 


800 


.060 


.057 


.054 


.053 


.053 


.054 


(.059) 


(.056) 


(.055) 


(.054) 


( 052) 


(.052) 


.092 


.091 


.091 


.092 


.090 


.091 




/ nao\ 
\ . Uoy ) 


( . 089) 


(.088) 


(.088) 


(.091) 


.063 


.061 


.060 


.060 


.060 


.060 


(.063) 


(.059) 


(.058) 


(.057) 


(.058) 


(.058) 


.036 


.035 


.035 


.035 


.035 


.035 


(.036) 


(.035) 


(.034) 


(.033) 


(.034) 


(.034) 


.117 


.109 


.105 


.103 


.101 


.103 


(.122) 


(.104) 


(.106) 


(.105) 


(.104) 


(.102) 


-03? 


.041 


.039 


.041 


.040 


.040 


(.043) 


(.042) 


(.041) 


(.040) 


(.042) 


(.041) 


.072 


.070 


.070 


.071 


.072 


.070 


(.076) 


(.070) 


(.069) 


(.069) 


(.068) 


(.068) 


J j8 


.036 


.035 


.036 


.036 


.036 


(.039) 


(.035) 


(.036) 


(.036) 


(.035) 


(.035) 


.375 


.287 


.233 


.221 


.218 


.211 


(.375) 


(.264) 


(.234) 


(.215) 


(.205) 


(.216) 


.177 


.156 


.125 


.115 


.111 


.108 


(.192) 


(.168) 


(.123) 


(.115) 


(.111) 


(.108) 



p 

F .088 
n 



F .058 
P 

F .033 
n 



F .102 
P 

F .040 

II 



F .068 
P 

F .041 
n 



F .212 
P 

F .105 

II 



with m = 25 or 50.) In other situations where is reasonably 
large, the degree of underestimation is not large even with 
samples of size 25. 

8. SUMMARY 

This paper describes an asymptotic inferential procedure for 
the estimates of the false positive and false negative error rates. 
Formulae and tables are described for the computations of the 
standard errors. A simulation study indicates that the asymptotic 
standard errors may be used even with samples of 25 cases as long 
as the Kuder-Richardson Formula 21 reliability is reasonably large. 
Otherwise, a large sample would be required. 
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APPENDIX A 

Tables of the False Positive Error and Its Standard 
Error (times vfa) , the False Negative Error and Its Standard 
Error (times m) , and the Correlation Between F and F 

F n 

(M « number of subjects, denoted by m In the text) 

Input to the tables are (i) number of test Items n, (11) cri- 
terion level 0 o> (111) mastery (passing) score c, (Iv) test * -an x, 
and (v) the KR21 reliability estimate. It may be noted that if a 
and B are estimates of the parameters a and g other than the moment 
estimates, then the entries for test mean and KR21 are simply 
net/ (a + 6) and n/(n + ct + B) , respectively. 

For each entry (n, 0 q , c, x, ot 21 ) , five values may be read 
out. They are F p , V f , F n> V fn> and p, respectively. 

Numerical Example 

Let n ■ 10, 0 = .60, and c = 6. For x = 5.0 and 0t ol - .60, 
the tables yield the values F p = .1667, V fp - .1858, F n = .0504, 
V - .0548, and p - .2941. 
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s p J^T?m th \ Fal8e Positi ve Error and its 
S S'fcfflKP 0, 5 he v Fa J Se Ne 8 ativ « Error and its 
rLh£ 25 t ( ^ ' % Cor «lation between FP and FN 

™!!.!f.^!?!:J:.!!!!f a . z ! r0! ,60, Master y score: 3 

Test KR21« 

-?!-...:. 1 !!?...- 200 - 300 - 400 -500 .600 .700 .800 .900 



0.5 



1.0 



1.5 



2.0 



2.5 



3.0 



3.5 



4.0 



4.5 



.0129 
.0881 
.0000 
.0000 
.8848 
.0676 
.2162 
.0000 
.0000 
.7526 

.1740 
.3255 
.0000 
.0009 
.6040 
.3221 
.3616 
.0010 
.0607 
.0066 

.4346 
1.3267 
.0235 
.4264 
-.9434 ■ 

.2833 
.7927 
.1160 
.3734 

-.0492 
.0475 
.9558 
.1451 
.4909 

-.8742 - 

.0011 
.0811 
.0670 
.1811 
.7006 
.0000 
.0003 
.0129 
.0880 
.9065 



.0184 
.1081 
.0000 
.0000 
.9009 
.0734 
.2289 
.0000 
.0016 
.7898 

.1835 
.2897 
.0r08 
.0264 
.5073 
.3056 
.4107 
.0088 
.1329 
-.5539 

.3565 
.8222 
.0425 
.1924 
-.8048 - 

.2548 
.5035 
.0990 
.2206 
.1276 
.0934 
.5065 
.1163 
.3112 
-.7223 

.0139 
.2505 
.0694 
.1187 
.1539 
.0005 
.0231 
.0180 
.0947 
.9064 



.0249 
.1264 
.0000 
.0011 
.9126 
.0890 
.2172 
.0005 
.0125 
.7710 

.1851 
.2405 
.0042 
.0554 
.2413 
.2755 
.3941 
.0184 
.1143 
-.5704 

.3001 
.5504 
.0497 
.1057 
-.4868 

.2299 
.3719 
.0858 
.1541 
.2843 
.1148 
.3073 
.0954 
.2044 
-.4448 

.0327 
.2560 
.0652 
.1065 
-.1596 
.0039 
.0749 
.0222 
.0767 
.8432 



.0321 
.1321 
.0003 
.0063 
.9025 
.0965 
.1835 
.0022 
.0274 
.6851 

.1776 
.2157 
.0093 
.0627 
.0267 
.2421 
.3363 
.0258 
.0828 
-.4354 

.2549 
.3964 
.0511 
.0732 
-.0663 

.2060 
.2911 
.0744 
.1149 
.4323 
.1227 
.2169 
.0792 
.1434 
-.1015 

.0493 
.2100 
.0582 
.0944 
-.1668 - 
.0105 
.1085 
.0243 
.0568 
.6873 



.0384 
.1198 
.0011 
.0145 
.8605 
.0986 
.1506 
.0051 
.0361 
.5356 

.1630 
.1970 
.0146 
.0554 
-.0338 
.2082 
. 2756 
.0302 
.0579 
- . 1844 

.2156 
.2980 
.0492 
.0586 
.3055 

. 1819 
.2341 
.0638 
.0884 
.5745 
.1220 
.1699 
.0654 
.1049 
.2386 

.0609 
.1603 
.0503 
.0802 
-.0239 
.0184 
.1128 
.0240 
.0456 
.4667 



.0419 
.0979 
.0026 
.0208 
.7584 
.0943 
.1288 
.0086 
.0357 
.3799 

.1430 
.1755 
.0186 
.0426 
.0586 
.1742 
.2218 
.0317 
.0418 
.1766 

.1789 
.2290 
.0448 
.0494 
.5830 

. 1.165 
. 1904 
.0535 
. 0689 
.7095 
.1145 
.1415 
.0531 
.0786 
.5316 

. 0664 
.1206 
, 0419 
.0664 
.2364 
.0254 
.0967 
.0219 
.0407 
.3415 



.0411 
.0801 
.0045 
.0210 
.5915 
.0833 
.1151 
.0115 
.0284 
.3490 

.1182 
.1508 
.0205 
.0303 
.3290 
.1392 
.1754 
.0302 
.0328 
.5701 



.0347 
.0710 
.0058 
.0157 
.5751 
.0651 
.1012 
.0124 
.0198 
.6038 

.0882 
.1228 
.0193 
.0231 
.7330 
.1016 
.1344 
.0254 
.0282 
.8560 



.1427 .1043 

.1768 .1341 

.0385 .0298 

.0420 .0350 

.7820 .9159 



.1286 
.1545 
.0429 
.0537 
.3317 
.1006 
.1213 
.0414 
.0592 
.7585 

.0651 
.0948 
.0330 
.0534 
.5732 
.0293 
.0735 
.0183 
.0368 
. 4494 



.0964 
.1227 
.0314 
.0410 
.9294 
.0795 
.1036 
.0295 
.0436 
.9100 

.0557 
.0813 
.0235 
.0408 
.8535 
.0280 
.0573 
.0135 
.0311 
.7633 



.0215 
.0583 
.0050 
.0116 
.9151 
.0386 
.0777 
.0096 
.0164 
.9460 

.0508 
.0885 
.0135 
.0203 
.9675 
.0579 
.0934 
.0163 
.0238 
.9799 

.0597 
.0934 
.0179 
.0268 
.9857 

.0564 
.0891 
.0180 
.0288 
.9871 
.0483 
.0806 
.0164 
.0292 
.9850 

.0358 
.0679 
.0129 
.0273 
.9793 
.0194 
.0492 
.0074 
.0217 
.9691 
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Table of the False Positive Error and Its 
S.E.*SQRT(M), the False Negative Error and Its 
S.E.*SQRT(M), and the Correlation between FP and FN 
Number of Items: 5, Theta Zero: .60, Mastery Score: 4 



Test KR21" 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



0.5 



1.0 



1.5 



2.0 



2.5 



3.0 



3.5 



4.0 



4.5 



.0011 


.0023 


.0041 


.0066 


.0093 


0113 


.0120 


0106 

i U1UU 


0069 


.0143 


.0237 


.0351 


.0425 


.0413 


0338 


0257 


0215 


0182 


.0000 


.0000 


.0001 


.0006 


.0027 


.0068 


.0122 


0167 


.0157 


.0000 


.0001 


.0026 


.0153 


.0373 


• 0572 


0625 


0498 


0357 


.9487 


.9614 


.9686 


.9641 


.9434 


.8839 


. 7387 


6091 


.8852 


.0100 


.0145 


.0197 


.0243 


.0271 


• 0275 


.0254 


.0205 


.0124 


.0647 


.0802 


.0827 


.0702 


.0541 


.0421 


.0354 


.0312 


.0247 


.0000 


.0001 


.0012 


.0054 


.0130 


• 0228 


.0321 


.0368 


.0303 


.0000 


.0037 


.0297 


.0688 


.0967 


.1028 


.0880 


.0628 


.0497 


. 8902 


ft 1 / C 

.9246 


. 9168 


.8729 


.7727 


.6047 


.4588 


.5653 


.9230 


.0383 


.0462 


.0511 


.0520 


.0495 


.0446 


.0376 


.0284 


.0166 


.1498 


.1347 


.0972 


.0741 


.0627 


.0550 


.0476 


.0392 


.0287 


.0000 


.0019 


.0099 


.0231 


.0378 


.0507 


.0587 


.0581 


.0428 


.0019 


.0610 


.1365 


.1655 


.1573 


.1297 


.0961 


.0703 


.0615 


.8506 


.8199 


.6593 


.4105 


.2184 


.1710 


.3055 


.6542 


.9534 


.0973 


.0972 


.0895 


.0793 


.0685 


.0574 


.0459 


.0335 


.0190 


.1945 


.1405 


.1324 


.1140 


.0935 


.0749 


.0588 


.0447 


0309 


.0021 


.0206 


.0451 


.0661 


.0810 


.0888 


.0885 


!o780 


.0525 


.1361 


.3228 


.3021 


.2375 


.1/71 


.1301 


.0985 


0823 


.0729 


.6031 


-.1545 


-.3693 


-.3240 


-.1643 


0976 


4539 


7991 


9728 


.1654 


.1324 


. 1090 


.0908 


• 0755 


0617 


0485 


0350 


0198 


.5614 


.3450 


.2237 


.1556 


• 1130 


0840 


0629 


0463 


0314 


.0536 


.1032 


. 1269 


.1365 


. 1370 


1301 


1160 


0933 




1.0248 


.5299 


.3155 


.2175 


.1667 


.1370 


.1174 


.1019 


.0835 


-.9025 


-.7726 


-.5290 


-.2062 


.1389 


.4563 


.7147 


.8937 


.9829 


.1220 


.1040 


.0901 


.0781 


.0669 


0561 

« \J J \J Jm. 


0450 


0330 


0189 


.4025 


.2371 


.1652 


. 1230 


.0945 


.0737 


.0574 


0438 


0305 


.2823 


.2565 


.2333 


.2105 


.1870 


.1618 


1337 


1007 


0592 


.7452 


.4755 


.3536 


.2791 


.2267 


.1863 


.1532 


1236 


.0917 


-.1005 


.0816 


.2465 


.4039 


.5552 


.6984 


.8267 


.9282 


.9870 


.0216 


.0403 


.0474 


.0487 


.0467 


.042*+ 


.0361 


!o277 


.0163 


.4163 


.1963 


.1115 


.0787 


.0633 


.0535 


.0455 


0377 


.0280 


.4126 


. 3398 


.2863 


.2430 


.2050 


.1694 


.1342 


.0970 


.0545 


1.2271 


.7850 


.5350 


.3910 


.2980 


.2319 


.1811 


.1382 


.0956 


-.9293 


-.7730 


- 4391 


-.0234 


.3378 


.6080 


.7997 


.9244 


.9871 


.0005 


.0062 


.0140 


.0203 


.0241 


.0253 


.0239 


.0197 


.0122 


.0375 


.1082 


.1029 


.0788 


.0570 


.0421 


.0340 


.0296 


.0238 


.2655 


.2.537 


.2296 


.2015 


.1723 


.1425 


.1119 


.0795 


.0435 


.3562 


.3708 


.3618 


.3190 


.2694 


.2222 


.1787 


.1370 


.0919 


.0802 


-.4077 


-.4606 


-.3387 


-.0933 


.2628 


.6363 


. 8843 


.9836 


.0000 


.0002 


.0017 


.0044 


.0075 


.0099 


.0110 


.0101 


.0067 


.0001 


.0104 


.0320 


.0437 


.0427 


.0345 


.0254 


.0204 


.0174 


.0902 


.0982 


.1023 


.100* 


.0929 


.0810 


.0656 


.0470 


.0254 


.2508 


.2423 


.2056 


.1785 


.1646 


.1526 


.1355 


.1106 


.0751 


.6534 


.6331 


.4798 


. 2606 


.1258 


.1669 


.4299 


.7997 


.9760 
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„ *? ble of the False Positive Error and its 
* ^fcJHKP 0, 5 he u Fal8e Negative Error and its 
n,™w 25 Jt 0, a H ^ Cor "l*tion between FP and FN 
__!!_°f_^ eni8: 5 ' 11)6 ta Zero: .70, Mastery Score: 4 

Test KR21 - " " " " 

™— I-??-— ???...;???...* 400 ,50 ° - 600 -700 .800 .900 

0,5 "oi43 '0237 -o^r-ii^r^ 

•0237 .0367 .0514 .0610 .0593 0478 rnfis miA 

"2222 "2222 - 00M - 0001 - 0005 0018 .0043 .M74 'o082 

"Ssoi -2S22 'T.l -222 - 0092 - 0209 - 0302 0284 

1 n '1*2} '950.9 .9650 .9669 .9606 .9360 8594 6854 8951 

1,0 "Si?? -Sii? - 0204 - 0275 -03A4 0392 !o400 !o353 ' 

.0647 .0823 .0999 .1066 .0972 .0783 0605 0509 0435 

.0000 .0000 .0001 .0007 .0028 0069 !oi24 0170 0161 

'SttO ^Kf -2i?S - 0311 ' 0455 -0*79 :0368 Oltl 

.8990 .9158 .9275 .9222 .8918 .8141 .6567 .5425 .8727 

1,5 *?io? '?67 7 7 'till '2SS « 2 - 069 ° « 0630 'O 509 -° 3 ^ 

«««« * ±~ 7 Z • 1632 • 1363 '1068 .0858 .0737 0648 0520 

•2222 -2222 - 0009 - 0041 .0100 .0175 0245 ' 0233 

'lm -22g -Si, 1 ! •!$?? -25? -? 698 : ° 577 mS :Sn 3 2 

2 0 0985 '?ioi "fiH "SSI " 479 ° - 3581 - 5083 - 9224 

•° :« 9 3 8 S : : : - 0 0 ^ -jju .g«j 

•So 0 ?? X 2 -SK •? ! : ° 348 :88 :8g? 

7filQ "222? -il?? • 1 ? 2A - 0815 -0579 .0414 .0375 

.7619 .7176 .5057 .23U .0637 .0545 .2335 .6426 .9583 

2,5 "moh 'IV$ 'VM "H2J 'J 381 - 1160 - 093 0 .0681 .0390 

.2908 .2501 .2470 .2165 .1796 .1450 1146 0877 nfiii 

fill IslI -?s? : °* 76 :o"m ."owe : 0 ° 

4486 a f?SS 'KK "I 081 '° 763 - 0571 -O^ 90 -0444 

.4486 -.3180 -.4854 -.4253 -.2470 .0563 .4669 .8238 .9781 

3,0 '9205 '5972 -?88 '\V$ 'iS! - 1174 '° 926 - 0670 - 0379 

n?S? *n?H *n2 8 o * 2834 - 2098 - 1537 .1207 .0901 .0616 

'6796 "SffS "?ol 8 - 0864 - 08n8 - 0708 -0560 0342 

qII* -1225 " 2 ?° - 098 1 - 0825 .° 722 -0630 .0508 

3 5 "*2029 "'liS "-?S?/ 7 "J? 91 - 5321 ' 7696 -I 75 9 870 

'6328 '3887 ' 97QQ 'Ull '}IIS -°" 4 - 0800 - 0586 '° 334 

1877 i2SJ *? 7 !? ,2 li5 ,1694 -l 355 .1° 8 1 -0840 .0587 

'5358 'Mil 'Hon 'S2i '° 949 - 0766 - 0563 -°321 

- o?66 'i2S 'JSfS -152S - 1218 - 0982 - 0773 - 0553 

.U-bb ,1658 .3325 .4848 .6255 .7529 .8620 .9441 .9900 

4,0 '2221 '257? -2SJJ "? 7 22 - 0724 - 0666 -0 570 -0436 .0254 

'2366 ' " 2 22n 'iiii '} 112 - 094b - 0825 -0697 0517 

7? "if 92 'I 332 "HOI -0890 .0687 .0482 ,0261 

.7130 .4997 .3462 .2526 .1910 1469 1129 0839 nSt 

, , "-SHJ -' 7243 -- 4250 --0348 3381 6266 8216 9368 '§896 

' "ilia -!g? -ffJ? -2522 : " 94 : "m ioiff :?fg 

a£Z£ •llio .136/ .1187 .0908 .0678 .0548 0485 n^xo 

'2374 '1814 -???5 -?S5i '° 559 - 043? : ° 298 0156 
ri?a 'iHl 'I 539 'I 378 " n82 - 0967 -0733 .0469 
-6138^. 2436^-. 0868 -.1407 -.0041 .2937 .6576 .3984 .9863 
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Table of the False Positive Error and its 
S.E.*SQRT(M), the False Negative Error and its 
S.E.*SQRT(M), and the Correlation between FP and FN 
Number of Iten.s: 5, Theta Zero: .80, Mastery Score: 4 



Test KR21- 



Mean 


1 AA 

. 100 


O A A 

.200 


AAA 

. 300 


/ AA 

. 400 


C A A 

. 500 


/ a a 

.600 


O AA 

. 700 


O AA 

. 800 


nnn 

. 900 


0.5 


.0011 


.0023 


.0041 


.0071 


.0116 


.0174 


.0234 


.0263 


.0212 




.0143 


.0237 


.0368 


.0542 


.0735 


.0857 


.0809 


.0618 


.0500 




A AAA 
• 0030 


A AAA 
• 0000 


A AAA 
. 0000 


A A A A 
. 0000 


A A A A 
. 0000 


AAAO 
. 0003 


AAI A 

.0010 


AAO O 


AAOO 

• 003/ 




A AAA 
.0000 


A AAA 
• 0000 


AAA A 
. 0000 


AAA! 
• 0001 


AA1 1 

. 0011 


AAi 3 

.0043 


A A A / 

.0094 


. 0117 


AAQ A 

• UUoU 




OA 1 A 

. 9410 


. 9961 


• 9681 


€\C O A 

• 9630 


O£01 

• 9631 


• 9536 


Al O O 

. 9173 


O Q AO 

• /o93 


oa a A 

• /40U 


i a 
1.0 


AT A A 
• 0100 


at a c 
. 0145 




AO O C 

• 0285 


AOO/ 

• 0384 


• 0485 


. 0556 


A CA Q 

. 0549 


AOOO 




. 0647 


AO 1 / 

• 0824 


1 AO O 

• 1033 


1 o / o 

• 1248 


1 O C A 

• 1369 


1 OAO 

. 1297 


• 1U5Z 


aqaa 


A7 AA 




A AAA 

• 0000 


A AAA 
• 0000 


A A A A 
• 0000 


A A A A 
• 0000 


A A AO 

. 0003 


AAI O 
. UU1Z 


A AO 1 

• 0031 


Arte c 
• 005 J 


AA£A 




AAA A 

• 0000 


AAA A 
• 0000 


A A AT 
• 0001 


AAI A 
• 0010 


AA/ A 

. 0049 


Al 1 O 

• 0117 


Al O/ 

. 0174 


Al CC 

• 0166 


Al AO 
• 010 J 




. 9394 


a / c c 

.9455 


A O O O 

. 9222 


A O O 1 

„9271 


A O O O 

. 9232 


AA / A 

. 8962 


O 1 o o 

• 8137 


a o o o 
. 6277 


ooo o 


1.5 


.0383 


.0474 


.0586 


.0713 


.0836 


.0920 


.0930 


.0825 


.0552 




.1507 


.1709 


.1903 


.1961 


.1790 


.1467 


.1152 


.0973 


.0856 




A AAA 

. 0000 


A A A A 

. 0000 


A A A A 

. 0000 


A A AO 

.0003 


AA1 / 

.0014 


A A O C 

. 0036 


• 006c 


AAA O 

.0097 


AAOC 

• 0096 




A AAA 

. 0000 


AAA A 

• 0000 


A A A A 

.0009 


A A C A 

.0059 


Al / O 

• 0148 


AO O O 

• 0227 


AO / / 

. 0244 


»\ 1 O c 

. 0185 


Al OA 




. 9760 


A / O O 

.9437 


. 8681 


o c o / 

. 8674 


O o c o 

• 8353 


O C A O 

. 7503 


C O / 1 

. 5841 


/ CI A 

• 4519 


Q A A Q 

. o44o 


O A 

2.0 


A A O C 

. 0986 


11 1 o 

. 1113 


1 O £. A 

• 1260 


1 O O A 

. 1380 


. 1440 


l / l o 

. 1417 


1 OA£ 

. 1296 


1 AC O 

. 1057 


• 0661 




.2548 


.2696 


.2635 


2278 


.1849 


.1526 


.1333 


.1193 


.0985 




.0000 


.0000 


.0003 


.0017 


.0045 


.0084 


.0123 


.0145 


.0123 




A A A A 

. 0000 


AAA ^ 

.0007 


.0082 


A A 1 / 

.0216 


A A 1 o 

.0318 


A A A A 

.0339 


A O O A 

.0279 


A 1 O 1 

• 0181 


ai /. o 
. 014Z 




n i a / 

. 8124 


T T ft A 

.7782 


. 7737 


TAO A 

• 7083 


.5696 


. 3803 


A / A / 

.2494 


/ A A 1 

.4091 


A 1 O O 

.9172 


o c 

2.5 


. 2007 


.2141 


AA1 / 

. 2216 


Ol Al 

. 2191 


O A £. O 

.2068 


1 0£ 1 

• 1861 


1 C O £ 

. 1576 


1 OAO 

. 1207 


AO! O 
. 0/1/ 




A C T A 

. 3512 


A O A 1 

.3291 


A / O A 

. 2680 


AAAA 

. 2292 


A A A ft 

.2098 


^ Aft ^ 

. 1921 


1 OAl 

. 1701 


1 / o o 

. 1432 


1 AO O 

• 1083 




A AAA 

. 0000 


AAA 1 

.0003 


A AO ft 

.0025 


A A / T 

.0067 


Al 1 O 

• 0118 


A 1 / 1 

• 0164 


A 1 A / 

. 0194 


A 1 A O 

. 0193 


Al / O 

. 0143 




A A A 1 

. 0002 


AT O O 

. 0132 


A O O C 

• 0386 


A C 1 O 

. 0512 


A / A 1 

.0491 


AO O O 

. 0387 


A ft £. 1 

• 0264 


Al OA 

. 0180 


Al OO 

« 01/3 




£. / AO 

. 6408 


£. 1 O £ 

. 6136 


/ A A / 

. 4094 


1 OAO 

. 1292 


ACA4 

- .0302 


- . 0693 


1 1 AA 

. 1190 


Cf\C o 

• oOo i 


. 96^5 


3.0 


.3469 


.3412 


.3172 


.2846 


.2485 


.2101 


.1693 


.1243 


.0712 




.3820 


.3611 


.3834 


.3555 


.3069 


.2554 


.2068 


.1612 


.1138 




AAA O 

. 0003 


A A / O 

.0043 


Al O / 

.0124 


Al AT 

.0195 


A O / C 

. 0245 


A A T A 

.0270 


AA / T 

.0267 


A O O 1 

• 0231 


Al C O 

• 015/ 




. 322o 


A A A C 

. 0906 


n A / o 

. 0943 


AO O A 

. 0730 


AC At 

. 0505 


AO O O 

.0337 


A O / O 

. 0247 


AOO O 

• 0222 


AO AO 

. 0209 




. 3417 


- . 355o 


C C TO 

- . 5572 


C O O A 

- .5239 


O £ A / 

- . 3604 


AO A A 

- . 0309 


/ CO A 

. 4594 


Q A "7 A 

. o470 


AOO A 

. 9830 


O c 

3.5 


. 4841 


/ A/ O 

.4047 


O / 1 c 

. 3415 


O O A O 

. 2893 


o / o o 

. 2433 


OA A C 

. 2005 


1 C O o 

• 1583 


1 1 / 1 
• 4 141 


A£/ A 

.0640 




1 O O / 1 

1. 2241 


A A O C 

. 9085 


C / 1 A 

. 6410 


.4728 


a / a ^ 

.3607 


O O Al 

• 2801 


O 1 O A 

.2179 


• 1654 


1 1 oo 
• 1133 




AT 0 / 

. 0134 


AO AO 

• 0302 


AO O 1 

.0381 


A / A "7 

. 0407 


A/ Al 

. 0401 


AO OA 

. 0370 


AO 1 A 

. 0319 


AO/ O 

• 0247 


Al A O 

. 014/ 




• 3240 


i "/A r 

.1/35 


A A 1 1 

. 0911 


A C C C 

.0565 


A / O C 

.0435 


A ft ft ft 

. 0382 


AO / C 

. 0345 


AO A/. 

. 0304 


AO A 1 

• 0241 




A / Al 

- . 9401 


o c c o 

- . 8653 


/ AA 

- . -»438 


AA/.A 

- . 2240 


o c o c 

• 2585 


£ 1 o o 

. 6127 


o o / c 

. 0245 


OA Al 

. 9401 


AAA/ 

. 9906 


a a 
4. 0 


. zy /o 


0£ A A 

. Zo40 




O AOO 




1 ROO 

• 10 Zo 


1 007 


AfiOl 


AAQA 

• U**70 




.9037 


.5804 


.4338 


.3443 


.2811 


.2321 


.1903 


.1501 


.1041 




.0942 


.0794 


.0679 


.0581 


.0490 


.0403 


.0315 


.0224 


.0123 




.3147 


.1884 


.1336 


.1012 


.0792 


.0628 


.0495 


.0377 


.0255 




.0776 


.2704 


.4316 


.5740 


.7002 


.8091 


.8969 


.9593 


.9927 


4.5 


.0182 


.0527 


.0731 


.0821 


.0829 


.0773 


.0663 


.0501 


.0285 




.6065 


.4677 


.3049 


.212/ 


.1662 


.1437 


.1292 


.1115 


.0814 




.0817 


.0687 


.0565 


.0462 


.0373 


.0292 


.0218 


.0146 


.0075 




.2148 


.1987 


.1514 


.1152 


.0887 


.0684 


.0517 


.0371 


.0228 




-.5847 


-.6145 


-.3945 


-.0524 


.3344 


.6510 


.8484 


.9502 


.9920 
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„ „ T S le ? f the False Positive Error and ita 
<: H2I M) ' ' he Fal8e ^gative Error and its 
n?™w 25 Ji?* an ? Correl «t:ion between FP and FN 

Itexn8: 6 » Theta Zeros - 60 » Mastery Score: 4 

Teat KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



0.6 



1.2 



1.8 



2.4 



3.0 



3.6 



4.2 



4.8 



5.4 



.0026 
.0269 
.0000 
.0000 
.9315 
.0227 
.1117 
.0000 
.0001 
.8744 

.0811 
.2293 
.0000 
.0003 
.7972 
.1903 
.2989 
.0006 
.0469 
.4964 



.0046 
.0403 
.0000 
.0000 
.9386 
.0299 
.1309 
.0000 
.0007 
.8871 

.0923 
.2208 
.0005 
.0206 
.7603 
.1897 
.2403 
.0091 
.1609 
.133? 



.3098 .2548 
.8709 .5865 
.0277 .0579 
.5905 .3145 
-.9105 -.7966 



.2149 
.6301 
.1779 
.5184 

-.0780 
.0288 
.6522 
.2329 
.6562 

-.8719 

.0004 
.0300 
.1091 
.2475 
.6397 
.0000 
.0000 
.0218 
.1194 
.8936 



.1893 
.3834 
.1559 
.3165 
.0991 
.0626 
.3694 
.1913 
.4568 
-.7455 

.0070 
.1434 
.1129 
.1300 
.2285 
.0002 
.0078 
.0287 
.1317 
.8825 



.0077 
.0567 
.0000 
.0005 
.9551 
.0383 
.1398 
.0003 
.0094 
.8899 

.1004 
.1764 
.0039 
.0608 
.6386 
.1766 
.2323 
.0224 
.1654 
-.3604 

.2133 
.3954 
.0722 
.1816 
-.5491 

.1682 
.2799 
.1379 
.2266 
.2553 
.0792 
.2223 
.1594 
.3109 
-.4849 

.0191 
.1671 
.1077 
.1596 
-.0909 
.0017 
.0361 
.0350 
.1179 
,8443 



.0120 
.0709 
.0002 
.0046 
.9553 
.0464 
.1281 
.0020 
.0282 
.8615 

.1024 
.1399 
.0105 
.0832 
.4385 
.1585 
.2063 
.0346 
.1331 
-.3278 

.1800 
.2818 
.0775 
.1226 
-.2002 

.1489 
.2141 
.1216 
. 1725 
. ' 024 
.0857 
.1541 
.1337 
.2225 
-.1435 

.0310 
.1447 
.0978 
.1436 
-.1360 - 
.0054 
.0625 
.0388 
.0930 
.7378 



.0169 
.0746 
.0009 
.0144 
.9426 
.0522 
.1051 
.0055 
.0455 
.7899 

.0987 
.1185 
.0185 
.0837 
.2669 
. 1383 
. 1731 
.0434 
.0989 
-.1706 

.1514 
.2084 
.0771 
.0937 
.1623 

.1301 
.1682 
.1059 
.1350 
.5439 
.0855 
.1189 
.1117 
.1549 
.2016 

.0399 
.1123 
.0857 
.1241 
-.0263 
.0107 
.0722 
.0395 
.0741 
.5656 



.0212 
.0661 
.0028 
.0260 
.9032 
.0541 
.0330 
.0107 
.0526 
.6637 

.0900 
.1036 
.0259 
.0707 
.2102 
.1171 
.1408 
.0481 
.0718 
.0939 

.1252 
.1570 
.0724 
.0768 
.4738 

.1111 
.1335 
.0901 
.1067 
.6300 
.0804 
.0976 
.0916 
.1246 
.4993 

.0447 
.0846 
.0723 
.1040 
.2025 
.0158 
.0660 
.0370 
.0638 
.4188 



.0233 
.0517 
.0057 
.0317 
.8021 
.0511 
.0681 
.0160 
.0472 
.5266 

.0770 
.0896 
.0309 
.0526 
.3182 
.0947 
.1115 
.0481 
.0535 
.4479 

.0997 
.1183 
.0639 
.0645 
.7149 

.0909 
.1055 
.0734 
.0840 
.8068 
.0708 
.0821 
.0722 
.0944 
.7324 

.0447 
.0647 
.0578 
.0846 
.5245 
.0193 
.0513 
.0317 
.0573 
.4483 



.0216 
.0411 
.0085 
.0268 
.6626 
.0424 
.0588 
.0192 
.0338 
.5727 

.0593 
.0743 
.0311 
.0373 
.6369 
.0700 
.0851 
.0424 
.0433 
.7877 

.0731 
.0877 
.0510 
.0538 
.8855 

.0681 
.0814 
.0549 
.0646 
.9140 
.0561 
.0634 
.0522 
.0699 
.8947 

.0389 
.0534 
.0418 
.0655 
.8251 
.0193 
.0382 
.0239 
.0492 
.7209 



.0146 
.0345 
.0084 
.0181 
.8704 
.0266 
.0471 
.0164 
.0249 
.9114 

.0355 
.0548 
.0233 
.0308 
.9465 
.0408 
.0590 
.0287 
.0366 
.9686 

.0424 
.0598 
.0319 
.0419 
.9794 

.0402 
.0575 
.0323 
.0458 
.9829 
.0345 
.0522 
.0296 
.0471 
.9812 

.0255 
.0439 
.0234 
.0445 
.9742 
.0138 
.0317 
.0136 
.0354 
.9608 
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Table of the False Positive Error and its 
S.E.*SQRT(M), the False Negative Error and its 
S.E.*SQRT(M), and the Correlation between FP and FN 
Number of Items: 6, Theta Zero: .60, Mastery Score: 5 



Test mi- 
Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



.0002 


.0005 


.0011 


.0022 


.0037 


.0053 


.0063 


.0061 


.0043 


.0032 


.0067 


,0124 


.0187 


.0219 


.0204 


.0159 


.0120 


.0100 


.0000 


.0000 


.0000 


.0003 


.0018 


.0057 


.0122 


.0193 


.0207 


.0000 


.0000 


.0009 


.0087 


.0238 


.0553 


.0728 


.0670 


.0452 


.9611 


.9695 


.9800 


.9800 


.9723 


.9475 


.8722 


.7074 


.8264 


.0028 


.0047 


.0075 


.0105 


.0131 


.0145 


.0143 


.0123 


.0^79 


.0242 


.0347 


.0425 


.0415 


.0344 


.0262 


.0202 


.0169 


.0138 


. 0000 


.0000 


.0006 


.0037 


.0110 


.0222 


.0351 


.0446 


.0408 


.0001 


.0012 


.0176 


.0550 


.0943 


.1166 


.1133 


.0865 


.0610 


. 9198 


.9496 


.9521 


.9356 


.8903 


.7918 


.6305 


.5606 


.8702 


.0150 


.0201 


.0245 


.0269 


.0272 


.0256 


.0224 


.0175 


.0106 


.0783 


.0824 


.0653 


.0476 


.0367 


.0305 


.0260 


.0218 


.0163 


.0000 


.0010 


.0073 


.0204 


.0375 


,0551 


.0690 


.0736 


.0588 


.0005 


.0376 


.1168 


.1697 


.1828 


.1662 


.1318 


.0937 


.0744 


.8980 


.8985 


.8351 


.6929 


.4995 


.3486 


.3333 


.5516 


.9180 


.0490 


.0526 


.0508 


.0465 


.0410 


.0350 


.0284 


.0211 


.0123 


.1431 


.0843 


.0701 


.061/ 


.0521 


.0425 


.0337 


.0257 


.0178 


.0011 


.0168 


.0431 


.0694 


.0909 


.1054 


.1106 


.1026 


.0733 


.0834 


.3049 


.3369 


.2926 


.2335 


.1780 


.1325 


.1026 


.0887 


.7924 


.3481 


-.0752 


-.1727 


-.1119 


.0531 


.3291 


.6943 


.9529 



.1022 


.0831 


.0636 


.0572 


.0476 


.0390 


.0308 


.0224 


.0129 


.3036 


.2075 


.1374 


.0957 


.0692 


.0510 


.0377 


.0274 


.0184 


.0500 


.1103 


.1442 


.1618 


.1681 


.1648 


.1518 


.1266 


.0826 


1.1073 


.6619 


.4181 


.2925 


.2207 


.1755 


.1453 


.1239 


.1032 


-.8525 


-.7564 


-.5606 


-.3021 


-.0058 


.3041 


.5992 


.8376 


.9721 


.0808 


.0679 


.0583 


.0502 


.0428 


.0353 


.0288 


.0212 


.0123 


.2768 


.1584 


.1080 


.0789 


.0596 


.0456 


.0349 


.0251 


.0179 


.3413 


.3169 


.2936 


.2697 


.2438 


.2149 


.1812 


.1399 


.0850 


.3538 


.5560 


.4181 


.3333 


.2731 


.2265 


.1878 


.1531 


.1158 


-.1869 


-.0187 


.1411 


.3005 


.4609 


.6199 


.7708 


.8988 


.9803 


.0114 


.0237 


.0238 


.0301 


.0292 


.0267 


.0229 


.0177 


.0107 


.2502 


.1281 


.0725 


.0500 


.0393 


.0327 


.0273 


.0223 


.0164 


.5224 


.4406 


.3770 


.3242 


.2769 


.2318 


.1862 


.1371 


.0791 


1.2797 


.8958 


.6313 


.4711 


.3647 


.2876 


.2271 


.1752 


.1230 


-.9352 


-.8121 


-.5234 


-.1289 


.2399 


.5303 


.7472 


.8983 


.9813 


.0001 


.0027 


.0072 


.0113 


.0140 


.0152 


.0143 


.0125 


.0079 


.0121 


. 0546 


.0600 


.0490 


.0364 


.0267 


.0207 


.0173 


.0140 


.3439 


.3315 


.3050 


.2718 


.2356 


.1975 


.1574 


.1137 


.0636 


.3972 


.4 1 


.4159 


.3795 


.3288 


.2765 


.2259 


.1758 


.1200 


.0258 


-.3551 


- . 4446 


-.3588 


-.1550 


.1634 


.5439 


.8421 


.9765 


.0000 


.0001 


.0006 


.0020 


.0038 


.0055 


.0065 


.0063 


.0043 


.0000 


.0031 


.0136 


.0225 


.0247 


.0214 


.0161 


.0121 


.0101 


.1224 


.1300 


.1347 


.1335 


.1256 


.1117 


.0922 


.0676 


.0374 


.2811 


.2805 


.2542 


.2241 


.2041 


.1887 


.1699 


.1419 


.0989 


.5333 


.5686 


.4862 


.3261 


.1900 


.1754 


.3584 


.7284 


.9650 
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<: v l*nU,L the J alse Positive Error and its 

« 9 SiSSJL 00, False Negative Error and its 
n.JwL! 3 T 2? > an 2 Correl «tion between FP and FN 

_.!r.!!.f!f? 8: ' heta Zero: - 70 » Mastery Score: 5 

Test KR21- * 

-!!... .:. 1 °°...;:!?. 0 ...;L 0 °...:i 0 . 0 . - 500 - 600 - 700 - 800 - 90 ° 

.00J2 .0067 .0127 .0219 .0319 0372 rmo P9«;n mo? 

.0000 .C000 .0000 .0000 6o53 .0014 '0041 '0084 'SllO 

• 9 64°o 0 -Tii -SEX -SS? - 0058 - 0180 :S 3 °82 : 

.yb^fU .9637 .9750 .9794 .9784 QfiQO cnAn aom 7701 

' '024 -S^l 7 -ffiZJ -2"? • «™ :0245° S \Vlll 

: 0 °ooo :Sooo : : ; -j™ -g" 8 

"SSS -SSSS -SSJ? -2??? : °? 61 : °° 72 :SS :Si24 : 

.9235 .9308 .9521 .9541 .9432 .9088 .8187 .6522 .8075 

1.8 .0150 .0205 .027/ .0345 .0400 .0426 0414 0354 rmn 

•0784 .0957 .1056 .0980 .0799 0615 0484 0407 "§33? 

.0000 .0000 .0005 .0030 .0089 0179 0282 MS8 n??5 

'Zl -Si, 3 ? -S5" : ° ° 3 : ° 8 * 3 : 0 °?75 8 : I 

2 4 0494 '22SS "XSi -Inll ' 8252 ' 70A9 -5005 .8682 

1684 *?fiR« "?t 7 2 'i 07 ,?! - 07 ° 2 ' 065A - 0569 -0447 .0273 

"Jooo 'lilt 'noli "Si? - 0700 - 060 * - 050 ^ -0380 

0003 "0250 'MIR "iiSS "??" - 0497 - 0527 - 0A18 

rS? fl?72 " - 1195 - 1265 - 1113 -0844 .0576 .0469 

.8542 .8340 .7467 .5657 .3556 .2252 .2511 .5290 !9271 

3,0 *2482 'lilt 'llnl -??£g - 0825 -0671 .0500 .0293 

^**oz . itOo .1509 .1367 . llfiQ no^l met ncoi 
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•o°L 6 -SS? -88 : ?1 t! :SS :SSS 

3 ' 6 "?7Qfi 'ii 7 ^ 'if -i 2 f 6 - 1051 -°864 .0685 .0499 .0287 

*0325 'nUZ - 2 225 - U97 - 1121 -° 8 ^ 0619 0418 

*?704 "Ssii '2??s "i 1 ™ ? •, 108A - 0979 - 0799 - 0508 

- 8820 - 801? 'llll 'J?™ "il\ - 1085 - 0921 - 0799 -0659 

4 2 lssq iSI? '-f??n "*?H 2 - 0363 - 3870 -6779 .8788 .9800 

'4989 "29ol '7U7 -J?52 -? 87 ? - 07 * 0 « 0596 -°"0 0254 

2421 21? J 'ioH •}?!? '} 2 2 A - 0970 - 0761 -0581 .0402 

6425 -^60 "inm 'KS ' 133/ ■ 1098 -° 82 ^ -0484 

-1039 oSffl - 1906 - 1558 -1270 .1011 .0735 

.1039 .0848 .2534 .4116 .5615 .7014 .3263 .9259 .9858 

4,8 'llll Isno Tsii -i°o?« -°^ 7 - 0A22 - 0326 - 019 * 

3177 2661 ol\l •?!!? '° 669 - 0572 -°^ 79 -0355 

7817 "fins^ ,2 «? -JSSS - 1576 - 1289 -i 008 -0719 .0398 

./oi/ .6053 .4352 .3239 .2481 1929 ILQf, n?/. n^o 

- A -.8598 -.7485 -.4864 -.1223 2?05 .5583 7793 "Jim *§858 

a 0038 -S?87 -0 0 867 'ISS 'ISS 1 0X7 1 '.till 

1223 19?? -? 827 - 0665 - 0A " -0386 .0330 .0266 

2759 ' "2oin -}Itx '!¥,l -? 8 1 A - 0639 -°^ 8 -0240 

till 'illl -Sf? 'J 923 'H A8 4529 .1276 .0989 .0646 
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Table of the False Positive Error and its 
S.E.*SQRT(M), the False Negative Error and its 
S.E.*SQRT(M), and the Correlation between FP and FN 
Number of Items: 6, Theta Zero: .80, Mastery Score: 5 



Test KR21- 



Mean 


.100 


.200 


.300 


.A00 


.500 


.600 


. 700 


.800 


.900 


0.6 


. 000Z 


. 0095 


. con 


AAA / 

. 002A 


. 0047 


.0085 


.0135 


.0179 


.0166 






.0032 


.0067 


.0128 


.0227 


.0373 


.0531 


.0600 


.0500 


.0356 






0000 


.0000 


.0000 


.0000 


.0000 


.0C A 2 


.0008 


.0025 


.00A3 






.0000 


.0000 


.0000 


.0000 


.0005 


.0031 


.0093 


.0153 


.0120 






.9653 


.9659 


.96A1 


.9752 


.977A 


.9750 


.9605 


.8979 


.7A63 


1. 


2 


.0028 


.00A7 


.0077 


.0123 


.0189 


.0273 


.0356 


.0395 


.0^20 






.02A2 


.0351 


.0499 


.0688 


.0873 


.0953 


.0856 


.0632 


.0A93 






.0000 


.0000 


.0000 


.0000 


.0002 


.0009 


.0030 


.0065 


.0090 






.0000 


.0000 


.0000 


.000A 


.0032 


.0105 


.0198 


.0233 


.0150 






.9151 


.9131 


.9AA3 


.9516 


.9537 


.9AAA 


.907? 


.7858 


.7238 


1. 


8 


m en 
. UIjU 


none 
. UZUj 


.0278 


HOT ^ 

.0373 


. 0A83 


.0586 


.06A7 


.0623 


.0A52 






.078A 


.096A 


.1173 


.13A7 


.1373 


.1212 


.09A3 


.071A 


.0608 






.0000 


.0000 


.0000 


."002 


.0010 


.0033 


.007A 


.0122 


.0137 






.0000 


.0000 


.000A 


.0037 


.0127 


.02A3 


.0315 


.0278 


.0166 


2. 




.9990 


.9058 


.9093 


.9165 


.9069 


.8692 


.7719 


.5908 


.7653 


A 


.0A9A 


.0597 


.0720 


,08A8 


.0950 


.0997 


.0966 


.0832 


,0550 






1 HOC 


1 o o o 
. 1383 


. 2003 


inn/ 

. 188A 


. 1576 


.12A1 


.0992 


.08A6 


.0715 






.0000 


.0000 


.0002 


.0012 


.00A1 


.0089 


.01A7 


.0193 


.0180 






.0000 


.0002 


.0050 


.0185 


.03A0 


.0A25 


.0A01 


.0282 


.0186 






.9990 


.8AA3 


.8550 


.83A1 


.7672 


.6368 


.A591 


.Alll 


.8560 


3. 


0 


.1219 


.1360 


.1A82 


.15A0 


.1518 


.1A18 


.12A2 


.0982 


.0605 






.2787 


.2800 


.2373 


.1886 


.1572 


.1382 


.1222 


.10A2 


.0806 






.0000 


.0002 


.0019 


.0063 


.C128 


.0196 


.0251 


.0269 


.0215 






.0000 


.0080 


.03A6 


.0569 


.0631 


.0557 


.0A09 


.026A 


.0223 






.8A6A 


.7519 


.6711 


.A851 


.2617 


.1179 


.1A36 


.A732 


.93A1 


3. 


6 


OA CC 


non 


. ZhZL 


. 2227 


i An a 

. 1980 


. 1700 


. 1390 


. 1038 


.0609 






.360A 


.2757 


.26A5 


.2527 


.2252 


.1912 


.1563 


.1222 


.0867 






.0001 


.00A1 


.0128 


.022A 


.030A 


.0355 


.0370 


.0336 


.0233 






.0129 


.0893 


.11A5 


. 1005 


.0758 


.0526 


.0363 


.0292 


.0275 






.5758 


.1153 


-.2963 


-.3877 


-.3160 


-.1010 


.2977 


.7586 


.9726 


A. 


2 


.3979 


.3379 


.2862 


.2A27 


.20A3 


.1686 


.1336 


.0970 


.0553 






. 8AA1 


.7126 


.517A 


.3832 


.2908 


.2237 


.1720 


.1290 


.0878 






.0133 


.0356 


.0A8A 


.05A2 


.0552 


.0525 


.0A66 


.0372 


.0229 






.37A0 


.2A39 


.1396 


.0866 


.0620 


.0511 


.0A5A 


.0A0A 


.02. • 






-.89A1 


-.8A88 


-.6809 


-.36A9 


.0691 


.A753 


.7553 


.915A 


,98h* 


A. 


8 


.2585 


.2273 


.201A 


. 177A 


.1539 


.1300 


.1047 


.0765 


.0A3A 






.7921 


.A988 


.3665 


.2862 


.2300 


.1870 


.1513 


. 1182 


.0816 






.1302 


.1121 


.0975 


.08A6 


.072A 


.0603 


.0480 


.03A? 


.019A 






.3959 


.2A19 


.17A2 


.1339 


.1060 


.0850 


.0677 


.052', 


.0360 






.0162 


.212A 


.3780 


.5258 


.6592 


.7770 


.8756 


.9A89 


.990A 


5. A 


.0108 


.0381 


.056A 


.0657 


.0678 


.06AA 


.0561 


.CA32 


.02A9 






.A112 


.3768 


.257A 


.1799 


.1367 


.11A2 


.100^ 


.0870 


.06A1 






.1152 


.0997 


.0836 


.0693 


.0566 


.0A50 


.0. 39 


.0231 


.0121 






. 2A38 


.2A97 


.1986 


.15A6 


.1209 


.09AA 


.072A 


.0527 


.0329 






-.A671 


-.596A 


-.A193 


-.1209 


.2AA& 


.5805 


.8113 


.9369 


.9896 
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Table of the False Positiv? Error and its 

c f*5;^S? T v (M) ' the False Negative Error and its 
S.E.*SQRT(M), and the Correlation between FP and FN 
Number of Items : 7, Theta Zero: .60, Mastery Score: 5 



Test KR21- 
Mean .100 



.200 .300 .400 .500 .600 .700 .800 .900 



0.7 



1.4 



2.1 



2.8 



3.5 



4.2 



4.9 



5.6 



6.3 



.0005 
.0070 
.0000 
.0000 
.9514 
.0073 
.0494 
.0000 
.0001 
.9023 

.0361 
.1413 
.0000 
.0001 
.8490 
.1077 
.2410 
.0004 
.0315 
.7106 

.2125 
.5349 
.0288 
.7127 
-.8578 

.1583 
.4857 
.2388 
.6481 

-.1233 
.0170 
.4301 
.3266 
.7747 

-.8707 ■ 

.0001 
.0107 
,1570 
.2986 
.5542 
.0000 
.0000 
.0330 
.1496 
.9156 



.0011 
.0131 
.0000 
.0000 
.9519 
.0109 
.0050 
.0000 
.0003 
.9271 

.0447 
.1501 
.0003 
.0139 
.8595 
.1135 
.1644 
.0083 
.1672 
.3273 

.1766 
.3978 
.0690 
.4349 
-.7749 

.1371 
.2912 
.2140 
.4054 
.0509 
.0411 
.2630 
.2742 
.5890 
-.7688 • 

.0035 

.0794 

.1623 

.2396 

.2489 - 

.0000 

.0025 

.0413 

.1645 

.8510 



.0023 
.0224 
.0000 
.0002 
.9716 
.0160 
.0792 
.0002 
.0061 
.9349 

.0528 
.1288 
.0032 
.0573 
.8113 
.1101 
.1400 
.0239 
.2033 
-.0637 

.1479 
.2731 
.0914 
.2664 
-.5723 

.1204 
.2053 
.1927 
.2959 
.2063 
.0536 
. 1582 
.2315 
.4150 
-.5286 

.0110 
.1060 
.15*3 
.2133 
-.0501 
.0007 
.0169 
.0494 
.1570 
.8310 



.0044 
.0337 
.0001 
.002S 
.9743 
.0218 
.0816 
.0015 
.0248 
.9255 

.0577 
.0995 
.0103 
.0939 
.6948 
.1015 
.1255 
.0405 
. 180(5 
-.1646 

.1246 
.1945 
.1021 
.18i0 
-.2838 

.1055 
. 15 AH 
.1/25 
.2>91 
.3553 
.053J 
.lOt'l 
.1963 
,3031 
-.1951 

.0191 
.0974 
.1436 
.1941 
-.1188 - 
.0027 
.035C 
.0551 
.1315 
.7547 



.0073 
.0420 
.0007 
.0122 
.9700 
.0271 
.0720 
.0052 
.0487 
.8920 

.0587 
.0781 
.0204 
.1068 
.5328 
.0903 
.1077 
.0542 
.1432 
-.1013 

.1046 
.1428 
.1049 
.1351 
.0454 

.0915 
. 1188 
.1525 
.1819 
.5000 
.0589 
.0823 
.1655 
.2279 
.1534 

.0257 
.0780 
.1272 
.1703 
-.0352 
.0060 
.0449 
.0569 
.1064 
,6201 



.0105 
.0423 
.0026 
.0274 
.9528 
.0306 
.057] 
. 01 1? 
.0652 
.8186 

.055* 
.0646 
.0313 
.0990 
.3975 
.0777 
.0889 
.0632 
.1074 
.0748 

.0863 
.1064 
.1014 
.1076 
.3639 

.0777 
.0925 
.1316 
.1457 
.6412 
.0556 
.0667 
.1369 
.1740 
.4581 

.0296 
.0588 
.1085 
.1445 
.1669 
.0097 
.0440 
.0544 
.0900 
.4770 



.0130 
.0350 
.0062 
.0400 
.9010 
.0310 
.0439 
.0194 
.0660 
.6906 

.0496 
.0548 
.0401 
.0784 
.3773 
.0637 
.0710 
.0662 
.0788 
.3601 

.0688 
.0792 
.091? 
.0887 
.6381 

.0634 
.0717 
.1089 
.1159 
.7761 
.0491 
.0553 
.1090 
.1327 
.7006 

.0303 
.0441 
.0877 
.1187 
.4729 
.0125 
.0354 
.0477 
.0797 
.4544 



.0133 
.0259 
.0107 
.0389 
.7702 
.0273 
.0356 
.0256 
.0508 
.6075 

0395 
,0456 
.0432 
.0549 
.5727 
.0478 
.0543 
.0610 
.0602 
.7115 

.0507 
.0577 
.0755 
.0735 
.8477 

.0476 
.0541 
.0829 
.0899 
.8954 
.0391 
.0453 
.0798 
.0988 
.8762 

.0268 
.0352 
.0644 
.0928 
.7912 
.0130 
.0257 
.0368 
.0690 
.6780 



.0098 
.0209 
.0120 
.0256 
.8306 
.0181 
.0289 
.0239 
.0343 
.8701 

.0245 
.0343 
.0347 
.0417 
.9181 
.0284 
.0376 
.0434 
.0498 
.9530 

.0298 
.0337 
.0488 
.0578 
.9712 

.0283 
.0375 
.0500 
.0643 
.9777 
.0243 
.0340 
.0462 
.0670 
.9763 

.0180 
.0285 
.0368 
.0637 
.9675 
.0097 
.0205 
.0213 
.0509 
.9499 
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Table of the False Positive Error and its 
S.£.*SQRT(M), the False Negative Error and its 
S.E.*SQRT(M), and the Correlation between FP and FN 
Number of Items: 7, Theta Zero: .60, Kastery Score: 6 



Test KR21- 

Mean . 100 . 200 



.300 


.400 


.500 


.600 


.700 


.800 


.900 


0003 




a UU1 J 




. UUJZ 


. 0035 


002/ 


.0041 


.0076 


.0107 


.0116 


.0099 


.0071 


.0056 


.0000 


.0002 


.0012 


.0044 


.0112 


.0204 


.0250 


.0003 


.0046 


.0203 


.0484 


.0759 


.0813 


.0556 


.9855 


.9870 


.9838 


.9717 


.9321 


.8045 


.7812 


.0028 


.0045 


.0062 


.0075 


.0080 


.0073 


.0050 


.0200 


.0226 


.0208 


.0166 


.0124 


.0096 


.0078 


.0003 


.0024 


.0086 


.0200 


.0354 


.0498 


.0504 


.0096 


.0405 


.0839 


.1196 


.1310 


.1097 


.0728 


.9680 


.9614 


.9384 


. 8823 


.7624 


.6141 


.8118 


0116 


0137 


• UAH/ 




• U 1 JZ 


. 0107 


.000/ 


.0410 


.0311 


.0231 


.0130 


.0148 


.0123 


.0094 


.0051 


.0163 


.0348 


.0557 


.0-5?w 


.0860 


.0742 


.0923 


.1594 


.1929 


.1921 


. 1641 


.1195 


.0864 


.9057 


.8298 


.6931 


.5254 


.4157 


.4997 


.8708 


.0285 


.0269 


.0244 


.0212 


.0175 


.0132 


.0079 


.0407 


0344 






. U17J 


• U1DU 




.0387 


.0681 


.0950 


.1160 


.1276 


.1243 


.0940 


.3442 


.3275 


.2790 


.2228 


.1684 


. 1246 


.1022 


.2559 


.0266 


-.0129 


.0669 


.2596 


.5915 


. 9249 


.0427 


.0357 


.0298 


.0245 


.0194 


.0142 


.0083 


.0832 


.0585 


.0423 


.0310 


.0228 


.0164 


.0109 


.1535 


.1788 


.1916 


.1934 


.1834 


.1581 


.1075 


.5047 


.3627 


.2750 


.2157 


.1737 


.1441 


.1204 


-.5636 


-.3498 


-.0994 


.1816 


.4838 


.7697 


.9574 




• UJZO 


H979 


. \JLll 


. 1)18 J 


. 0135 


.0080 


.0700 


.0504 


.0375 


.0284 


.0214 


.0157 


.0107 


.3442 


.3210 


.2947 


.2640 


.2267 


.1790 


.1121 


.4733 


.3798 


.3131 


.2611 


.2178 


.1790 


.1376 


.0442 


.2003 


3649 


.5363 


.7081 


.8639 


.9715 


.0174 


.0185 


.0181 


.0167 


.0144 


.0113 


.0069 


.0468 


. 0317 


.0245 


.0201 


,0166 


.0134 


.0098 


.4610 


.4011 


.3464 


.2933 


.2387 


.1785 


.1055 


.6999 


.5330 


.4193 


.3352 


.2679 


.2090 


.1437 


« .5976 


-.2289 


.1428 


.4504 


.6909 


.8686 


.9740 


. L.UJ / 


An/* i 
. ni)62 


. 0081 


.0091 


.0091 


.0078 


.0051 


.0343 


.0299 


.0230 


.0169 


.0127 


.0103 


.0082 


.3776 


.3407 


.2938 


.2534 


.2044 


.1493 


.0855 


.4529 


.4259 


.3781 


.3240 


.2688 


.2121 


.1470 


-.4362 


-.3779 


-.2064 


.0786 


.4530 


.7939 


.9675 


.0002 


.0009 


.0020 


.0030 


.0038 


.0038 


.0028 


.0057 


.0113 


.0139 


.0131 


.0102 


.0074 


.0060 


.1675 


.1665 


.1584 


.1429 


.1199 


.0895 


.0507 


.2930 


.2650 


.2411 


.2222 


.2014 


.1712 


.1221 


.4601 


.3500 


.2305 


.1881 


.3103 


.6553 


.9507 



0.7 



1.4 



2.1 



2.8 



3.5 



4.2 



4.9 



5.6 



6.3 



.0000 
.0006 
.0000 
.0000 
.9690 
.0008 
.0083 
.0000 
.0000 
.9339 

.0053 
.0377 
.0000 
.0002 
.9154 
.0242 
.0941 
.0006 
.0477 
.8656 

.0619 
.1576 
.0443 
1.1093 
-.7663 

.0528 
.1872 
.3871 
. 9524 
-.2560 
.0060 
.1472 
.6173 
1.2565 
-.9406 

.0000 
.0033 
.4183 
.4201 
-.0564 
.0000 
.0000 
.1567 
.3075 
.6254 



.0001 
.0018 
.0000 
.0000 
.9702 
.0015 
.0138 
.0000 
.0004 
.9623 

.0036 
.0456 
.0005 
.0215 
.9327 
.0280 
.0581 
.0129 
.2665 
.6563 

.0514 
.1219 
.1110 
.7555 
-.7257 

.0439 
.1045 
.3653 
.6240 

-.1053 
.0138 
.0824 
.5314 
.9609 

-.8448 

.0012 
.0270 
.4055 
.44 02 
-.3295 
.0000 
.0009 
.1632 
.3096 
.4923 
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Treble of the Falje Positive Error and its 
„ 2'5i* s Q? T pO. the False Negative Error and its 

S.E.*SQRT(M), and the Correlation between FP and FN 

Number of Items: 7, Theta Zero: .70, Mastery Score: 5 

Test KR21- 

-!! n — ,10 ° ,200 ,30 ° ,A0 ° - 500 - 600 - 700 - 300 - 900 

0,7 '2225 - 00U - 0023 -0045 .0079 .012s"""oi82"""o214"""oi79 

.0070 .013. .0225 .0363 .0529 .0649 .0638 .0492 .0367 

,0000 .0000 .0000 .0000 .0001 .0004 .0015 .0035 .0049 

'222? -222° 10000 - 0002 -0016 .0061 .0132 .0169 .0117 

i l ,9 nni\ -2?£ -2222 - 9719 - 9660 - 9421 - 8566 - 78 * 2 

'2°S '2J2? -S"! - 0232 - 0319 - 0 * 06 -0465 -0460 .0339 

• - 0845 - 103A - 1120 - 10 * 5 -0842 .0631 .0512 

.0000 .0000 .0000 .0001 .0006 .0022 .0052 .0089 .0101 

•2222 -2222 -222? - 0023 - 0089 - 018 7 .0258 .0240 .0150 

.9033 .9317 .9286 .9356 .9304 .9053 .8377 .6988 .7995 

2.1 .0361 .0450 .0558 .0672 .0769 .0821 .0806 .0702 .0469 

'mm - 1701 ' U65 - 1172 - 0927 -0765 .0624 

•2222 -2222 - 000 ' - 0010 • 033 -oo?3 .0122 .oieo .0151 

'SSr? 'iill 'Hit - 287 '° 371 - 0362 -0267 0175 

9 fi '?nr •???? -?5Sg - 8076 - 711 * .5755 .5260 .8576 

* -BK 'J 3 ?? -} 382 .1367 .1231 .1126 .0896 .0559 
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'?!?? -???2 -2S 2 ? -°i 23 - 0391 - 0340 -o 27 6 01 99 iono 
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Table of the False Positive Error and its 
S.E.*SQRT(M) , the False Negative Error and its 
S.E.*SQRT(M), and the Correlation between FP and FN 
Number of Items: 7, Theta Zero: .70, Mastery Score: 6 
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c L £ 0 the Fal8e Positive Error and its 
c I'iA Q / T { M) » the Fal8e Ne 8 at ive Error and its 
S.E.*SQRT(M) , and the Correlation between FP and FN 
Number of Items: 7, Theta Zero: .80, Mastery Score: 6 



Test KR21- 
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.0348 
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.0693 
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.0479 
.7816 
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.0713 
.3276 
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.0430 
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-.0998 
.1406 
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.0686 
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.0569 
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.0.126 
.03*1 
.9582 
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.0215 
.0509 
.0177 
.0441 
.9870 
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Table of the False Positive Error and its 
S.E.*SQRT(M), the False Negative Error and its 
S.E.*SQRT(M), and the Correlation between FP and FN 
Number of Items: 8, Theta Zero: .60, Mastery Score: 5 
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. 0245 


.0199 


.0121 






.0602 


.0773 


.0847 


.0766 


.0618 


.0484 


.0399 


.0343 


.0264 






.9041 


. 9290 


.9261 


.8964 


.8271 


.7067 


.5894 


.6585 


.9292 
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Table of the False Positive Error and its 

c |-5cnlS/m (M) ' !j h \ Fal8 e Negative Error and its 
s>.fi.*SQRT(M) , and the Correlation between FP and FN 

!!^! r . of Item8s 8 ' Thetft Zero: - 6 °. Mastery Score: 6 

Test KR21- 

.!!!! . * 300 - 400 - 500 - 600 .700 .800 .900 

0,8 *Sn?J -222I - 00 2? .ooirVoon"""oo5r"oo7r"oo8i""oo65 

.0016 .0039 .0081 .0148 .0220 .0255 0232 )171 0128 

.0000 .0000 .0000 .0000 .0005 .0022 .0062 !oi22 !oi56 

.0000 .0000 .0001 .0016 .0093 .0258 0447 0504 0342 

.9623 .9610 .9789 .9830 .9817 .9732 9452 8521 ftofin 

1.6 .0022 .0039 .0065 .0100 .0139 0171 !oi86 01U "o?22 

.0196 .0294 .0409 .0480 .0466 .0390 0296 0224 0179 

.0000 .0000 .0001 .0011 .0045 0114 0212 0309 0317 

.0000 .0001 .0036 .0197 .0468 0720 .0819 !o691 ^445 

.9195 .9456 .9563 .9539 .9368 .8944 .8024 ollt .8281 

2,4 '21o5 - 0210 - 0271 - 0319 -° 3 ^ -0343 .0316 .0260 0167 

.0788 .0923 .0876 .0705 .0539 .0422 0344 0283 0216 

.0000 .0002 .0024 .0093 .0206 0344 .0473 !c545 0468 

.0001 .0086 .0488 .0953 .1218 .1235 1050 0752 0528 

.9122 .9060 .8876 .8241 .7099 .5670 4693 5460 8827 

3 * 2 'JS? 'SSL 1 '° 638 '° 581 - 0509 0423 ;o325 :§196 

.1767 .1237 .0906 .0776 .0670 .0561 0452 0347 0241 

.0002 .0069 .0235 .0434 .0617 0757 0828 WJ 'o594 

.0194 .1575 .2246 .2191 .1854 .1450 1075 0788 0627 

.8087 .6220 .2582 .0398 .0057 .0988 .3092 !6364 !9326 

4,0 'lill -KS? *?S 4 ? -2ZH - 0588 - 0469 - 03A7 - 0206 

•212° .1831 .1313 .0962 .0713 .0527 .0379 .0251 

.0277 .0756 .1060 .1230 .1302 .1292 1201 1016 nfi77 

.7585 .5392 .3504 .2435 .1808 1413 ilUO 15933 !o737 

-.7690 -.7415 -.5728 -.3310 -.0420 .2639 .5563 !8026 19606 

4 * 8 'Mil 'o,li '? 8 ^ -?SJ '° 635 '° 537 - 0437 -0328 .0197 

.3648 .2132 .1474 .1088 nMi oa^i; n/.o/. 
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,2132 - 1474 .1088 .0827 .0635 .0484 !o360 *0245 

.2942 .2688 .2459 .2233 .2000 .1750 1469 1136 070? 

.7609 .4845 .3589 .2815 2264 1834 1474 1153 '§833 

-.1750 -.0072 .1470 .2972 .4464 5949 7402 8735 9713 

* 6 -2?57 '° n fS °T 79 : ° 336 .'0269 \Vu\ 

• 1832 .1109 .0749 .0564 .0452 .0370 0299 0222 

.4197 .3591 .3071 .2629 .2235 1865 1499 1110 0654 

I70I -S27I 'l 0 ^ '™l * 2892 « 2233 -1718 1288 :0879 

-.8705 -.7914 -.5732 -.2519 .0980 .4108 .6645 .8550 .9705 

6,4 'nS?? "2S? -S2S? -2i3-g - 0163 - 0193 -0202 .0182 .0125 

'2114 AM -SSJ? - 0A04 - 0300 - 0232 -0185 

,2152 - 2083 - 1931 .1726 .I486 .1213 0901 0523 

.3364 .2919 .2638 .2428 .2160 1854 1537 1212 0841 

.4526 .2347 -.0320 -.1131 -.0492 1308 4208 7532 9593 

7.2 .0000 .0000 .0003 .0014 .0034 0058 0079 0086 0067 

.0000 .0008 .0077 .0190 .0272 0286 .0241 !oi73 '§133 

•?782 '?Sv "JS?? '5Z8 '°Jn 5 5 '° 733 > 0653 ^5U ^304 

■9990 'aiiq 'Si? *i 4 22 - 1 , 179 - 1030 - 0895 - 0672 

^ 9 * 90 - 8148 -8081 .7535 .6^69 .5167 .4633 .6382 .9361 
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Table of the False Positive Error and its 
S.E.*SQRT(M), the False Negative Error and its 
S.E. 'SQRT(M) , and the Correlation between FP and FN 
Number of Items: 8, Theta Zero: .60, Mastery Score: 7 



Test KR21- 





i nn 


. zUU 


inn 

• JUU 


/•nn 


. jUU 


. o00 


1AA 

. 700 


onn 

• 800 


ft n n 

• 900 


0.8 


.0000 


.0000 


.0001 


.0002 


.0006 


.0011 


.0017 


• 0020 


• 0016 




.0001 


.0004 


.0013 


.0029 


.0050 


.0063 


.0060 


• 0044 


.0032 




. UUUU 


. 0000 


o ft ft ft 

• 0000 


• 0001 


• 0007 




.0098 


• 0204 


• 0285 




. UUUU 


. 0000 


nnm 

• 0001 


. 0023 


m o £ 

. 0136 


• 0397 


.0736 


• 0913 


.0669 




. /41 


• 9732 


.9832 


ft ft n ^ 

. 9907 


ft ft ft / 
. 9894 


f\ r% ^ ^ 
• 9827 


. 9601 


• 8730 


.7600 


1 . o 


• 0C02 


, 0005 


A A1 A 

.0010 


• 0019 


• 0029 


.0039 


.0044 


.0043 


.0031 




. UU27 


n n c ft 

.0052 


f\ f\ oft 

.0089 


. 011 6 


• 0120 


. 0103 


.0077 


• 0057 


.0045 




• UUUU 


nnnn 
• UUUU 


nnm 
• UUU1 


nn c 

• 00x5 


• 0065 


. 0173 


A A / A 

.0340 


.0526 


.0587 




. UUUU 


nnm 
. UUU1 


n n c n 
• 0050 


• 0283 


.0703 


. 1147 


.1407 


. 1301 


• 0855 




. 94oo 


n ^ n / 

. 9694 


ft ^ £ ^ 

.9766 


ft h i o 

• 9742 


ft £ 1 o 

. 9613 


a o n a 

. 9280 


.8458 


. 6873 


.7596 


2.4 


.0022 


.0036 


.0054 


.0069 


.0079 


.0082 


.0078 


• 0065 


.0042 




.0172 


.0235 


.0241 


• 0197 


.0148 


,0111 


.0087 


.0071 


.0055 




nnnn 

• UUUU 


nnn9 

• UUUZ 


nooA 


m OA 


• U JUo 


acq/: 

. Uj Jo 


n t T£ 

. 0776 


n ft c n 

. 0950 


n o o o 
• 0882 




nnm 
• uuux 


• UllO 


nr. 0 9 


• XHlJ 




0H7 1 

• ZU /I 


i ft n o 

. 1903 


• 1454 


AAA/ 

. 0984 




Q95A 


Q ^n k 


Q^QI 


CO A A 


• oU / Z 


• OOjZ 


Q1 Q£ 

• j1 jo 


A ft 1 C 

• 4915 


.8157 


3 2 


nn 7 

• U X X / 


m A A 


hi 57 


• UlJJ 


m A A 


m o 7 
• U1Z / 


m n7 
• U1U/ 


nn o o 
• UUoZ 


AACA 

. 0050 




.0570 


.0400 


.0257 


.0201 


.0169 


• 0141 


• 0114 


• 0088 


• 0062 




.0003 


.0096 


.0335 


.0641 


.0948 


• 1215 


.1397 


• 1425 


.1138 




n?fii 

• UZQl 


991 


. J JZ1 


OAA1 
• JHHL 


• Jllj 


. zouy 


on o A 
• 2034 


• 1482 


• 1142 




• 7U10 


• oUZ? 


CO/, /, 

. jZ44 


• 24 JU 


i i An 
• 1140 


• lloU 


1 o o c 

• 2325 


• 5049 


o o o o 

. 8883 


a n 


• u Joy 


. UJ14 


n. o£0 
• UZo J 


n oo 1 
• QZZ1 


m o c 

• 0185 


m c O 

• 0153 


• 0121 


• 0090 


.0053 




np.ni 

• UOUx 


• U / UZ 


• U*f7 / 


• UJ JH 


• UZj/ 


n i o o 


m o o 

. 0138 


• 0099 


. 0065 




n^7o 


1 n7A 

• 1U / H 




1 QQ7 


OAQ1 

. ZUol 


• 2157 


"n no 

.zlOz 


• 1867 


1 ft ft n 

. 1321 




X • Uj / J 


• ol J J 


• j /zu 


A 9 1 A 
• HZ J4 


O O C ft 

• 3Z59 


• 2557 


on o c 

. 2026 


T /* ft ft 

• 1632 


*i ft c r\ 

.1350 




- A1 AA 


■ • roUO 


C /.Q1 

- . j491 


- • Jooy 


- • 1jo4 


n nn / 

. 0904 


. 3788 


s t\ ft s 
. 6936 


t\ ft T f\ 

.9379 


4.8 


.0342 


.0281 


.0238 


• 0203 


.0172 


.0143 


.0115 


.0086 


.0051 




.1249 


.0682 


.0450 


.0320 


.0236 


.0176 


.0132 


.0096 


.0064 




A 91 £ 
• 4Z10 


A fi 0 ft 
. 4U J9 


• JojU 


• 3637 


.3385 


. 3077 


. 2688 


.2165 


.1396 




x • UZuf 


£Q1 O 


. )ZU/ 


A on o 
. 4203 


o / o n 

• 3480 


om o 

• 2913 


.2438 


.2014 


.1569 




- 0n79 
- • JU / Z 


i 7£n 
- . 1/oU 


n /. nn 
- • U4UU 


i n o n 
• 1080 


O 7 1 O 

.2713 


/ r a A 

. 4503 


.6399 


.8234 


.9606 


D • 0 


nnoi 

• UUJ1 


nn on 
. 0080 


nine 
• U1U j 


m i o 
. 0113 


m l o 

• 0112 


. 0104 


.0090 


. 0071 


.0044 




n<^i 


• l' j/J 


n onn 
• U JUU 


n onn 

• uzuu 


m c o 

.0152 


A1 O A 

. 0123 


.0101 


.0081 


.0059 




• o9oy 


• olUb 


• J JOZ 


• 4714 


.4112 


A C 1 A 

. 3519 


.2898 


.2199 


.1327 




1 1 n A 0 


no o •? 
. 9c 0/ 


• 1 nil) 


• 5779 


• 4617 


. 3742 


.3023 


.2390 


.1724 




- • v4jo 


- • d/Z J 


r tin r 

- , bozo 


o o o c 
- • 3226 


ft / £ O 

• 0468 


. 3688 


. 6j11 


. 8352 


.9653 


6.4 


noon 


nnos 


nm o 


nn^A 

• uu JH 


nnA7 


nn^A 

• UU DH 


nn ^ ^ 

• UU J J 


nn/. q 

. uu^y 


nnti 
. UU J J 




.0012 


.0131 


.0193 


.0180 


.0143 


.0107 


.0079 


.0062 


.0049 




.4388 


.4743 


.4457 


.4063 


.3600 


.3085 


.2515 


.1868 


.1086 




.4306 


.4538 


.4752 


.4590 


.4171 


.3640 


.3066 


.2452 


.1726 




-.1452 


-.3243 


-.4345 


-.3971 


-.2513 


.0049 


.3680 


.7411 


.9566 


7.2 


.0000 


.0000 


.0001 


.0004 


.0010 


.0017 


.0022 


.0023 


.0018 




.0000 


.0002 


.0023 


.0056 


.0077 


.0078 


.0064 


.0045 


.0035 




.1923 


.1972 


.2005 


.1992 


.1908 


.1739 


.1479 


.1122 


.0647 




.3309 


.3331 


.3233 


.2999 


.2749 


.2531 


.2303 


.1983 


.1444 




.9990 


.4131 


.4157 


.3476 


.2511 


.1988 


.2770 


.5860 


.9330 
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Table of the False Positive Error and its 

c: ^SiSSKP 0, S he v Fa i 8e N ?8* tlve Error and its 
b.E.*SQRT(M) , and the Correlation between FP and FN 

_„.! r . of Item8: 8 » Theta Zero: .70, Mastery Score. 6 

Test KR21- 
Mean . 100 



.200 .300 .400 .500 .600 .700 .800 .900 



0.3 



1.6 



2.4 



;.2 



4.0 



4.3 



3.6 



6.4 



7.2 



.0001 0003 

.0016 .0039 

.0000 .0000 

.0001 .0000 



.9658 
.0022 
.0196 
.0000 
.0000 
.9205 



.9642 
.0039 



.0007 
.0082 
.0000 
.0000 
.9619 
.0065 



.0294 .0430 

.0000 .0000 

.0000 .0001 

.9269 .9483 



.0016 
.0158 
.0000 
.0001 
.9793 
.0107 
.0596 
.0001 



.0C34 
.0275 
.0000 
.0010 
.9816 
.0166 
.0730 
.0005 



.0014 .0075 
.9561 .9558 



.0065 
.0399 
.0003 
.0052 
.9795 
.023f 
.0756 
.0021 
.0192 
.9435 



.0106 .0142 

.0453 .0378 

.0014 .0040 

.0141 .0219 

.9676 .9184 



.0298 
.0651 
.0056 
.0311 
.9053 



.0155 .0211 

.0788 .0968 

.0000 .0000 

.0000 .0001 



.9497 
.0591 
.1853 
.0000 
.0000 
.*z42 



.8958 
.0702 
.1989 
.0001 
.0046 
.8289 



.0285 .0375 .0466 .0534 .0557 

.1150 .1222 .1127 .0926 .0710 

.0001 .0007 .0030 .0077 .0143 

.0024 .0128 .0297 .0442 .0483 

.9108 .9088 .8840 .8254 .7158 

.0898 .0931 .0908 .0825 

.1181 .0964 .0809 

.0124 .0208 .0284 

.0710 .0699 .0569 

.5710 .4221 .3564 



.0132 
.0260 
.0066 
.0169 
.7981 
.0322 .0256 
.0476 .0358 
.0108 .0138 
.0333 .0213 
.7990 .7740 

.0512 .0361 
.0548 .0441 
.0206 .0211 



.0815 
.1837 
.0014 
.0283 
.8037 



.1493 
.0055 
.0562 
.7149 



.0388 
.5980 
. 676 
.0671 



.0239 
.8129 
.0436 
.0512 



.0324 .0276 
.0389 .0274 
.4961 .8876 



''006 *2283 'l 1 ^ * 17^/ - 1018 '° 777 - 0 * 73 

,y)0b .2283 .1883 .1714 .1517 .1286 1044 nftnfi nsfi-* 

.0001 .0037 .0112 .0249 .0355 0433 0468 0444 ml 

• C087 .0874 .1294 .1249 .1021 5764 .0545 otoo 'Mil 
.6774 .4312 .0392 -.1386 -.1322 .0056 .2768 .6571 

.3037 .2600 .2211 .1881 .1590 .1321 

.5887 .5190 .3782 .2779 .2080 .1571 

.0148 .0431 .0607 .0698 .0728 .0710 

'IIH - i9A7 - 1283 - 0927 -0727 

-.8272 -.7993 -.6354 -.3670 -.0233 .3267 

.2163 .1902 .1688 .1493 .1306 .1116 

.6392 .3932 .2827 .2157 .1690 .1336 

• -1552 .1377 .1217 .1062 .0907 
.4995 .3058 .2197 .1678 .1317 .1043 

-.1046 .0726 .2289 .3758 .5173 .6543 

.0120 .0395 0578 .0676 .0708 

.4015 .3366 .2249 .1572 .1178 

.1993 .1725 .1466 .1240 .1040 

.3767 .3574 .2705 .2031 .1546 

-.6497 -.6773 -.4938 -.2202 .0976 

.0000 .0011 .0056 .0126 .0198 

.0011 .0362 .0815 .0982 .0921 

.0466 .0548 .0593 .0592 .0554 

.1774 .1624 .1247 .1003 

.8038 .7707 .6163 .4106 



.1059 .0785 

.1181 .0865 

.0647 .0534 

.0599 .0497 

.6243 .8429 

.0915 .0690 

.1050 .0802 

0743 .o559 

0821 .0628 

7840 .8982 



.0871 
.2788 



.0689 .0621 .0502 

.0937 .0773 .0641 

.0854 .0673 .0486 

.1187 .0907 .0673 

.4098 .6764 .8702 

.0252 .0276 .0255 

0752 .0570 .0443 

0489 .0402 .0296 

0773 .067.: .0547 

2913 .4740 .7722 



.9439 

.0467 
.0581 
.0348 
.0391 
.9697 
.0413 
.0557 
.0334 
.0440 
.9777 

.0314 
.0486 
.0276 
.0447 
.9757 
.0173 
.0359 
.0166 
.0377 
.9653 
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Table of the False Positive Error and its 
S.E.*SQRT(M), the False Negative Error and its 
S.E,*SQRT(M), and tue Correlation between FP and FN 
Number of Items: 8, Theta Zero: .70, Mastery Score: 7 



Test KR21- 



Mean 


.100 


.200 


.300 


.400 


.500 


.600 


.700 


.800 


.900 


0.8 


.0000 


.0000 


.0001 


.0003 


.0007 


.0016 


.0030 


.0044 


.0043 






. 0001 


.0004 


.0013 


.0033 


.0072 


.0120 


.0149 


.0129 


.0036 






.0000 


.0000 


.0000 


.0000 


.0001 


.0006 


.0029 


.0085 


.0154 






.0000 


.0000 


.0000 


.0001 


.0019 


.0102 


.0294 


.0500 


.0428 






.9769 


.9754 


.9755 


.9887 


.9902 


.9884 


.9796 


.9401 


.7917 


1. 


6 


.0002 


.0005 


.0010 


.0021 


.0040 


.0064 


.0089 


.0102 


.0085 






.0027 


.0052 


.0097 


.0162 


.0226 


.0252 


.0226 


.0164 


.0118 






.0000 


.0000 


.0000 


.0001 


.0009 


.0040 


.0115 


.0237 


.0328 






.0000 


.0000 


.0001 


.0026 


.0143 


.0388 


.0674 


.0788 


.0540 






.94S4 


.9507 


.9719 


.9770 


.9761 


.9667 


.9373 


.8411 


.7384 


2. 


4 


.0022 


.0036 


.0059 


.0091 


.0127 


.0158 


.0174 


.0166 


.0121 






. 0172 


. 0253 


. 0350 


.0411 


.0399 


.0332 


.0249 


.0183 


.0145 






.0000 


.0000 


.0001 


.0013 


.0058 


•> 


.0300 


.0460 


.0508 






.0000 


.0001 


.0043 


.0240 


.0585 




.1091 


.0955 


.0592 






.9242 


.9432 


.9542 


.9524 


.9350 


.8913 


.7944 


.6358 


.7526 


3. 


2 


.0118 


.0162 


.0213 


.0256 


.0282 


.0286 


.0268 


.0224 


.0147 






.0604 


.0729 


.0712 


.0579 


.0439 


.0337 


.0270 


.0221 


.0171 






. 0000 


.0002 


.0025 


.0104 


.0243 


.0425 


.0612 


.0741 


.0676 






.0000 


.0082 


.0522 


.1092 


.1465 


.1547 


.1358 


.0979 


.0653 






.8985 


.9145 


.9010 


.8447 


.7329 


.5729 


.4322 


.4524 


.8341 


4. 


0 


.0427 


.0494 


.0515 


.0499 


.0461 


.0407 


.0342 


.0263 


.0161 






.1417 


.1031 


.0711 


.0588 


.0510 


.0432 


.0352 


.0272 


.0191 






.0001 


.0066 


.0246 


.0482 


.0719 


.0916 


.1040 


. 1040 


.0809 






.0151 


.1596 


.2510 


.2592 


.2278 


.1822 


.1349 


.0958 


.0770 






.8432 


.7172 


.3363 


.1022 


-.0016 


.0350 


.2032 


. 5407 


.9161 


4. 


8 


.1078 


.0926 


.0783 


.0662 


.0556 


.0^59 


.0366 


.0270 


.0160 






. 2095 


. 1940 


. 1413 


. 1026 


. 0757 


.0563 


.0417 


.0301 


.0200 






.0262 


.0800 


.1179 


.1416 


.1541 


.1569 


.1494 


.1290 


.0881 






.7887 


.6274 


.4255 


.2990 


.2197 


.1682 


.1341 


.1116 


.0934 






-.6800 


-.7355 


-.6071 


-.4091 


-.1523 


.1534 


.4820 


.7768 


.9593 


5.6 


.0892 


.0752 


.0648 


.0560 


.0479 


.0402 


.0325 


.0241 


.0143 






.2997 


.1725 


.1181 


.0867 


.0657 


.0504 


.0385 


.0287 


.0194 






.3332 


.3101 


.2879 


.2649 


.2401 


.2122 


.1797 


.1398 


.0862 






.8203 


.5295 


.3978 


.3173 


.2604 


.2163 


.1793 


.1454 


.1089 






-.2337 


-.0644 


.0983 


.2615 


.4259 


.5891 


.7453 


.8817 


.9747 


6. 


4 


. 0052 


.0165 


.0233 


.0264 


.0270 


.0256 


.0226 


.0178 


.0109 






.1715 


.1332 


.0836 


.0565 


.0423 


.0340 


.0282 


.0232 


.0172 






.4701 


.4098 


.3538 


.3043 


.2592 


.2160 


.1725 


.1262 


.0726 






. 8019 


. 7454 


.5746 


.4448 


.3502 


.2782 


.2199 


.1684 


.1155 






-.8396 


-.7863 


-.5847 


-.2709 


.0909 


.4241 


.6891 


.8746 


.9760 


7. 




.0000 


.0005 


.0023 


.0051 


.0078 


.0096 


,0102 


.0092 


.0061 






.0005 


.0154 


.0331 


.0380 


.0341 


.0268 


.0200 


.0158 


.0128 






.1923 


.1955 


.1915 


.1793 


.1610 


.1381 


.1112 


.0804 


.0445 






.3303 


.3107 


.2820 


.2626 


.2424 


.2173 


.1868 


.1498 


.1017 






.4109 


.3041 


.1101 


-.0118 


-.0051 


.1379 


.4302 


.7783 


.9674 
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Table of the False Positive Error and Its 

c S-SiJSSSP 0, th \ Fal8e Negative Error and its 
s.E.*SQRT(M), and the Correlation between PP and FN 
Number of Items: 8, Theta Zero: .80, Mastery Score: 7 

Test KR21- " 
Ilean .100 



.200 .300 .400 .500 .600 .700 .800 .900 



0.8 



1.6 



2. A 



3.2 



4.0 



4.3 



5.6 



6.4 



7.2 



.0000 
.0001 
.0000 
.0000 
.9684 
.0002 
.0027 
.0000 
.0000 
.9466 

.0022 
.0172 
.0000 
.0000 
.9238 
.0118 
.0604 
.0000 
.0000 
.9990 

.0428 
. 1468 
.0000 
.0000 
.9990 

.1190 
.2669 
.0000 
.0033 
. 7547 
.2562 
.3796 
.0109 
.391.' 
-.6566 • 

.1902 
.5973 
.2027 
.5447 
-.0954 
.0035 
.1806 
.1877 
.3106 
.2714 - 



.0000 
.0004 
.0000 
.0000 
.9738 
.0005 
.0052 
.0000 
.0000 
.9597 

.0036 
.0253 
.0000 
.0000 
.9263 
.0162 
.0761 
.0000 
.0000 
.9036 

.0523 
.1647 
.0000 
.0023 
.8593 

.1308 
.2187 
.0024 
.0677 
.6547 
.2262 
.4150 
.0407 
.3634 
-.7723 • 

. 1648 
.3640 
.1805 
.3416 
.1005 
.0196 
.2345 
.1694 
.3355 
.5545 ■ 



.0001 
.0013 
.0000 
.0000 
.9681 
.0010 
.0097 
.0000 
.0000 
.9939 

.0060 
.0369 
.0000 
.0000 
.9446 
.0224 
.0940 
.0000 
.0015 
.9187 

.0631 
.1638 
.0009 
.0216 
.3543 

.1351 
.1620 
.0111 
.1290 
.3362 
.1945 
.3257 
.0634 
.2430 
-.6745 - 

. 1445 
.2604 
.1613 
.2515 
.2723 
.0331 
.1769 
.1464 
.2875 
.4456 - 



.0003 
.0034 
.0000 
.0000 
.9825 
.0022 
.0174 
.0000 
.0001 
.9717 

.0098 
.0521 
.0000 
.0011 
.9544 
.0305 
.1060 
.0005 
.0105 
.9219 

.0729 
.1391 
. 0046 
.0536 
.3001 

.1314 
.1387 
.0241 
.1420 
.0335 ■ 
.1661 
. 2474 
.0772 
.1609 
-.4724 ■ 

.1264 
.1983 
.1434 
.1972 
.4303 
.0412 
.1266 
.1242 
.2331 
.2119 



.0007 
.0080 
.0000 
.0001 
.9881 
.0044 
.0293 
.0000 
.0011 
.9761 

.0154 
.0665 
.0004 
.0072 
.9565 
.0396 
.1033 
.0028 
.0295 
.9059 

.0790 
.1089 
.0121 
.0779 
.6840 

.1221 
.1248 
.0377 
.1252 

-.0906 
.1405 
.1885 
.0839 
.1115 

-.1639 

.1093 
.1556 
.1255 
.1592 
.5771 
.0445 
.0939 
.1035 
.1872 
.1073 



.0019 
.0168 
.0001 
.0012 
.9839 
.0083 
.0427 
.0004 
.0064 
.9759 

.0 '28 
.0720 
.0022 
.0212 
.9476 
.0477 
.0873 
.0081 
.0500 
.8597 

.0800 
.0852 
.0224 
.0843 
.5117 

.1086 
.1093 
.0492 
.0968 
-.0553 
.1166 
. 1441 
.0843 
.0837 
.222.3 

.0922 
.1236 
.1071 
.1300 
.7119 
.0437 
.0746 
.0838 
.1494 
.4508 



.0043 
.0276 
.0005 
.0067 
.9858 
.0140 
.0498 
.0023 
.0190 
.9665 

.0303 
.0641 
.0070 
.0385 
.9159 
.0522 
.0662 
.0168 
.0603 
.7559 

.0751 
.0697 
.0335 
.0729 
.3566 

.0916 
.0916 
.0562 
.0678 
.1479 
.0930 
.1095 
.0787 
.0690 
.5917 

.0745 
.0978 
.0874 
.1055 
.8307 
.0391 
.0632 
.0646 
.1171 
.7323 



.0079 
.0313 
.0023 
.0184 
.9683 
.0197 
.0427 
.0071 
.0330 
.9249 

.0345 
.0465 
.0151 
.0455 
.8150 
.0501 
.0489 
.0267 
.0513 
.5838 

.0634 
.0584 
.0412 
.0493 
.3889 

.0706 
.0726 
.0559 
.0469 
.5656 
.06C5 
.0808 
.0661 
.0603 
.8498 

.0550 
.0751 
.0651 
.0331 
.9256 
.0309 
.0541 
.0450 
.0873 
.9060 



.0098 
.0212 
.0061 
.0215 
.8498 
.0199 
.0277 
.0135 
.0277 
.7633 

.0293 
.0331 
.0218 
.0297 
.6960 
.0370 
.0395 
.0302 
.0300 
.7253 

.0420 
.0463 
.0377 
.0323 
.8454 

.0433 
.0520 
.0427 
.0404 
.9393 
.0401 
.0544 
.0434 
.0510 
.9752 

.0319 
.0514 
.0380 
.0589 
.9847 
.0184 
.0406 
.0242 
.0561 
.9840 
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Table of the False Positive Error and its 
S.L .*SQRT(M), the False Negative Error and its 
S.E.*SQRT(M) , and the Correlation between FP and FN 
Number of Items: 9, Theta Zero: .60, Mastery Score: 6 



Test KR21 
Mean 



1 A A 

. 100 


AAA 

. 200 


.300 


.400 


.500 


.600 


.700 


.800 


.900 


.0002 


.0005 


.0013 


.0027 


.0051 


.0086 


.0123 


0145 


0121 


.0033 


.0070 


.0134 


.0234 


.0350 


.0427 


.0413 


.0314 


.0227 


A AA A 

. 0000 


A A A A 

. 0000 


. 0000 


. 0000 


.0002 


.0010 


.0033 


.0070 


.0094 


A AAA 

. 0000 


A A A A 

. 0000 


.0000 


.0005 


.0040 


.0130 


.0250 


.0303 


.0208 


.9553 


A P" ^ f 

. 9576 


. 9732 


.9777 


.9782 


.9721 


.9507 


.8781 


.8140 


. UU jsJ 


A A T A 

. 0079 


.0121 


.0178 


.0243 


.0300 


.0332 


.0318 


.0229 


• U JO J 


A C A 1 

. 0503 


. 0666 


.0737 


.0794 


.0692 


.0538 


.0403 


.0316 


. UUUU 


AAOA 

. ouuu 


A A A A 

. 0000 


. 0004 


• 0022 


.0061 


.0121 


.0184 


.0194 


f\f\f\f\ 

. UUUU 


A A A A 
. UUUU 


A A 1 1 

. 0013 


A A A A 

• 0039 


a A / r 

. 0245 


.0411 


.0491 


.0425 


.0268 


0170 


O O O.G 

. yz jy 


A / A A 

. 9409 


A / A C 

.9426 


. 9301 


.8959 


.8213 


.7064 


.8260 


.0321 


.0405 


.0498 


.0576 


.0620 


.0622 


.0578 


.0482 


.0315 


.1291 


.1453 


.1429 


.1205 


.0954 


.0756 


.0615 


.0503 


.0384 


• UUUU 


A A A 1 

. 0001 


A A 1 1 

.0011 


.0048 


.0115 


.0202 


.0285 


.0333 


.0289 


• UUUU 


A A O 1 

. 003Z 


A A / A 

. 0240 


A c A a 

. 0533 


.0726 


.0759 


.0652 


.0465 


.0317 


noon 


. Buy / 


O C A A 

. 8600 


O A £ A 

. 8060 


. 7067 


.5814 


.4928 


.5599 


.8795 


1110 


i o n r 
. 1ZU j 


1 A 1 H 

. 1217 


. 1161 


. 1061 


.0933 


.0781 


.0600 


.0369 


.2556 


.1930 


.1534 


.1337 


.1169 


.0903 


.0802 


. 0618 


.0429 


.0001 


.0035 


.0133 


.0259 


.0379 


.0469 


.0515 


.0494 


.0368 


. UU / J 


AO/' C 

. Uouj 


. 1371 


1 O O / 

. 1384 


. 1177 


.0913 


.0670 


.0487 


.0376 


. /.JOU 


fZ t 1 A 

. jo JO 


A A A A 

. 2302 


. 0269 


- .0013 


.0998 


.3191 


.6438 


.9309 


. 2466 


one 

. 2115 


.1798 


.1529 


.1293 


.1076 


.0864 


. 0645 


.0388 


/. 7 O /. 


.4206 


a a f *% 
.3063 


A A / 1 

.2241 


.1667 


.1250 


.0931 


.0673 


.0446 


. 0162 


A / O A 

. 04C2 


.0687 


.0797 


.0838 


.0823 


.0757 


.0633 


.0420 


/Oil 

,4ol2 


*5 e a c 

. 3605 


• 2305 


.1565 


• 1148 


.0896 


.0722 


.0581 


.0442 


. 77vz 


7 ^ A / 

- . 7624 


- . 5941 


- .3363 


-•0234 


.2951 


.5797 


.8087 


.9589 


.1872 


.1640 


.1452 


.1284 


.1122 


.0961 


0792 




0370 


.5497 


.3338 


.2375 


• 1794 


.1390 


.1085 


.C839 


.0629 


.0431 


.2077 


1 O / O 

. 1843 


• 1648 


. 1468 


.1292 


.1113 


.0922 


.0706 


.0435 


. 5763 


.3552 


.2559 


. 1956 


.1534 


.1210 


.0945 


.0715 


.0496 


1 At\C 

. 1 Wo 


A O O C 

. 02o5 


1 7 A 1 

. 1 7 91 


. 3226 


.4631 


.6023 


.7392 


.8679 


.9680 


ni o/» 


• uJLj 


rv l C A 

. 0552 


.0639 


• 0668 


.0651 


.0592 


.0486 


.0315 


O O 70 


. Juzo 


1 A C A 

. 1962 


. 1352 


. 1006 


.0792 


.0640 


.0515 


.0386 


or i a 


0 0 R£ 

. ZZDo 


1 A A 1 

. 1922 


. 1636 


• 1383 


. 1148 


.0918 


.0679 


.0402 




/. /. /. R 


1 O A 1 

. 3291 


*i / 1 A 

. Z439 


1 A a. c 

. 1835 


.1392 


.1050 


.0771 


.0515 


7/. "7£ 
. 74/0 


- . 7303 


c / c / 

- . 5454 


-.2716 


.0433 


.3497 


.6172 


.8294 


.9641 


• 0000 


0018 


• UU / 7 


m £ s 

. UitJ 


fi 0 A ft 


on l 
. UJ11 


. 0340 


A 1 O 1 

.0321 


. 0230 


!0027 


.0508 


.0940 


.1024 


.0911 


.0729 


.0546 


.0408 


.0318 


.0964 


.1057 


.1086 


. 104'J 


.0967 


.0852 


.0710 


.0537 


.0319 


.2452 


.2078 


.1621 


.1375 


.^01 


.1033 


.0862 


.0685 


.0482 


.71C2 


.6264 


.3892 


.1985 


.1440 


.2166 


.4119 


.7090 


. 94 75 


.0000 


.0000 


.0003 


.0016 


.0044 


.0085 


.0125 


.0147 


.0121 


.0000 


.0005 


.0078 


.0239 


.0396 


.0466 


.0427 


.0316 


.0228 


.0125 


.0179 


.0244 


.0309 


.0356 


.0374 


.0355 


.0294 


.0183 


.0796 


.0985 


.1099 


.1042 


.0874 


.0696 


.0565 


.0480 


.^374 


.6261 


.9112 


.9137 


.8924 


.8375 


.7377 


.6223 


.6506 


.9173 



0.9 



1.8 



2.7 



3.6 



4.5 



5.4 



6.3 



7.2 



8.1 
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INFERENCE FOR ERROR HATES 



S E T *SORT?^ th ?K Fa i S ? Po ? itive F «or and its 
?iSS( M) ' the False Negative Error and it 



s 

s 



the C.rr.lStion between™ and FN 



Test KR21« 

-f!" - 10 ° - 20 ° - 300 - A0 ° - 500 .600 .700 



0.9 



5.4 



.800 .900 



: :82 :Zl :Zi :Zl :8g -88 • ? SlS 

: : :8S8 :X :Zl : : 

i.« : : :88 :S8 :8B :BS g« : j : 

:88S2 :88 :§oo! :„° 0 2 „i : °f : $ 

I, 9 ?? -SSJS -221? -gj*s -o«e : 0 °736 iSlfi :SJJj :g|?J 

.9353 .9542 .9681 .9687 .9592 .9332 .8704 .7433 7925 

2 - 7 :82s :88 -.nil :8?5 S -gg -g™ •«» 

.0000 .0001 .0018 .0079 0196 0355 n ? '2Lo *2JoS 

: I2?5° -§SS -III! -llll • •' :88 iSiJ! 

3.« .0315 :0376 jSg? .1? ^ ! 3 ' ^ 'Jf?? - 0 8 f^ 

1 1 1 iii 1 1 1 1 1 

4,5 'lili -?f2 '° 569 - 0478 -0396 .0317 .0235 0141 

. : § I : :S0 :?i :SS : gg :' 



•«S -f22i ' 4264 -'2282 .1763 1402 1127 0"90 

-.6164 -.6950 -.5579 -.3533 -.1026 .1795 .4748 .7513 .9474 



.087S 
.0090 




ERIC 



0136 
,0160 
,0920 
,1019 

6.3 .0056 .6168 loB* .M^ loSS I Sm| 'Jg? ' gg -»"| 

: i 1! II II II S ; 

-.0715 -: S i 7 2 7 8 .:!!IS ; 0 3 ^ :!!?? JSS :88 
7 2 IB S :'S : S ill ffi :88 

.3643 .3346 .3685 .'2875 \™\ 'gig 'Ho 7 ? • "fj •?£?« 

61 aa«irfi® Ifa 

:88 1 i : : : :^i 7 :8S 
:J2L:!2Li!2lj^.i^.j^:«8 :SSSS :88J 
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Table of the False Positive Error and its 
S.E,*SQRT(M), the Talee Negative Error and its 
S.E.*SQRT(M), and the Correlation between FP and FN 
Number of Items: 9 f Theta Zero: .70, R.Jtery Score: 7 



Test KR21- 



Mean 


.100 


.200 


.300 


.400 


.500 


.600 


.700 


.800 


.900 


fi Q 


onnn 
• uuuu 


• uuux 




• WUUo 


. "JJLq 


fi. m o 


. .01 


fifi n o 
. 0093 


fifiO"7 

. 009/ 




.0004 


.0011 


.0023 


.0065 


,0i36 


.0233 


.0307 


.0237 


.0189 




.0000 


.0000 


.0000 


.0000 


.0000 


.0002 


.0013 


.0042 


.0081 




.0000 


.0000 


.0000 


.0000 


.0006 


.0041 


.0138 


.0258 


.0228 




.9732 


.9716 


.9713 


.98*1 


.9868 


.9864 


.9798 


.9506 


.8288 


1.8 


.0007 


.0014 


.3026 


.0043 


.0085 


.0135 


.0189 


.0223 


.0192 




.0072 


.0124 


. 0206 


.0324 


.0451 


.0522 


.0490 


.0367 


.0256 




.0000 


.0000 


.0000 


.0P~0 


.0003 


.0018 


.0056 


.0122 


.0175 




.0000 


.0000 


.0000 


' ;0C 


.0053 


.0182 


.0344 


.0421 


.0288 




. 9402 


. 9349 


.9596 


.9673 


. 9694 


.9630 


.9401 


.8661 


.7706 


•> 7 




nno7 




none 


no 7 Q 


fi 1 /. O 

. 0343 


ft *> ft 1 

. 0382 


fiO "71 

. 03/1 


fi O "7 c 

. 0275 




.0403 


.0545 


.0707 


,0827 


.0323 


.0717 


.0553 


.0406 


.0314 




.0000 


,0000 


.0000 


.0005 


.0026 


.0076 


.0156 


.0245 


.0274 




.0000 


.0000 


.0014 


.0101 


.0284 


.0484 


.0589 


.0518 


.0314 




.9517 


.9203 


.9340 


.9366 


.9238 


.8875 


.8086 


.6779 


.7758 


3.6 


.0315 


.0397 


.0491 


.0574 


.0626 


.0636 


.0599 


.050b 


.0337 




. 1231 


1 / ft. / 

. 1404 


, 1400 


. 1192 


ft ft ft ft 

.0939 


.0735 


ft C ft ft 

.0593 


ft / ft ft 

. 0483 


ft O "f ft 

.0370 




.0000 


.0000 


.0011 


.0050 


.0125 


.0228 


.0334 


.0404 


.0366 




.0000 


.0023 


.0237 


.0563 


.0800 


.0859 


.0749 


.0531 


.0348 




.3773 


.3733 


.3673 


.8175 


.7177 


.5736 


.4611 


.4961 


.8504 


4.5 


.1016 


.1120 


.1151 


.1114 


.1030 


.0914 


.0771 


,0 r "f 


.0370 




. 2441 


. 1946 


.1451 


.1237 


.1090 


.0932 


.C764 




.0414 




.0000 


.0030 


.0129 


.0266 


.0404 


.0517 


.0582 


.0573 


.0438 




.0051 


.0306 


.1410 


.1496 


.1306 


.1022 


.0741 


.0525 


.0412 




.7613 


.6407 


.3272 


.0712 


-.0114 


.0492 


. 2472 


.5920 


.9246 


s /. 


. zzyy 


on ni 
• ZUU1 


i i n 
. 1 lu 


i i.ro 

. 1h jo 


. 1233 


. 1026 


AO')/. 

. 0824 


fid 0 

. 0613 


fio a 




.3390 


• 3C22. 


,<.J75 


.2133 


.15 P S 


.1203 


.0899 


.0652 


.0433 




.0142 


.0474 


.0707 


.0843 


.0904 


.0903 


.08*1 


.0710 


.0475 




.4631 


.3363 


.2547 


.1723 


.1239 


.0951 


.0766 


.0630 


.0497 




-.7099 


-.7606 


-.6249 


-.3979 


-o0959 


.2409 


.5571 


.8079 


.9616 


6.3 


. r, 51 


.1531 


.1348 


.1185 


.1031 


.0878 


.0719 


.0543 


.0327 




.5347 


.3223 


.2237 


.1723 


1334 


.1042 


.0808 


.0609 


.0418 




.2199 


.1966 


.1767 


.1579 


.1394 


.1202 


.0995 


.0759 


.0462 




.5916 


.3684 


,2682 


.2074 


.1646 


.1317 


.1046 


.0806 


.0569 




-.1419 


.0338 


.1908 


.3400 


.4347 


.6259 


.7615 


.3839 


.9732 


7.2 


.0076 


.0283 


.0431 


.0514 


.0544 


.0533 


.0434 


.°394 


.0249 




.2774 


.2600 


.1775 


.1239 


.0920 


.0724 


.0590 


.0483 


.0364 




.2569 


.2259 


.1940 


.1655 


.1397 


.1155 


.0916 


.0668 


.0385 




.4220 


.4275 


.3345 


.2j59 


.1970 


.1524 


.1172 


.0374 


.0585 




-.6166 


-.6337 


-.518c 


-.2581 


.0551 


.3704 


.6459 


.3530 


.9713 


8.1 


.0000 


.0005 


.0034 


.0084 


.0139 


.0135 


.0203 


.0197 


.0137 




.0003 


.0197 


.0534 


.0703 


.0693 


.0533 


.0443 


.0335 


.0268 




.0623 


.0715 


.0772 


.0777 


.0736 


.0653 


.0543 


.0408 


.0232 




2046 


.1961 


.1583 


.1291 


.1113 


.0985 


.0358 


.0707 


.0496 




.7613 


.7473 


.626' 1 


.4490 


.3159 


.3020 


.4491 


.7395 


.9583 



ERIC 



238 



INFERENCE FOR ERROR RATES 



<• f ?Qn?T?m tn !u Fa i s ? Positive Error and its 

s S 'feSSSn 00, 5 h6 u Fa i Se Ne 8 ative E"or and its 
S.E.*SQFT(M), and the Correlation between FP and FN 

J^!!.!f.f!!?!;.;:.!!: i !!!. Zeros - 60 » Mastery Stres s 

Test KR21- " " 

Mean 



.100 


.200 


.300 




.0000 


.0000 


. U JUU 


AAA1 


. v 004 


.0000 


.0000 


.0000 


,0000 


.0000 


.0000 


.9773 


.9791 


.9905 


.0001 


.0002 


.0004 


.0008 


.0019 


.0038 


.0000 


.0000 


.0001 


.0000 


.0000 


.0026 


.9557 


.9724 


.9818 


.0008 


.0015 


.0025 


. ill. / j 


Til 1 ^ 


.0134 


.0000 


.0001 


.0022 


.0000 


.0062 


.0502 


.9340 


.9614 


.M574 


.0056 


.0076 


. J086 


.0327 


.0263 


.0169 


.0001 


.0070 


.0282 


.0139 


.1777 


.3083 


.9225 


.8743 


.6980 


.0217 


.0190 


. 0161 


.0415 


.0399 


.0294 


.0318 


.1014 


.155i 


.9750 


.8416 


.6204 


-.3596 


-.6190 - 


.5216 



.400 


.500 


finn 


.0001 


.0002 


.0005 


> .0011 


.0022 


• 0033 


.0000 


0004 

• WW W"T 




.0011 


• 0087 


Oil 9 


.9929 


9925 


• 700J 


.0008 


0014 


0090 


.0057 


• w w u u 


OHA9 


,0009 


00/* 8 


0 1 A A 
• UXH" 


.0191 


0566 

• U JDU 


• XUjU 


.9815 


9718 

• J / JO 




.0035 


.0042 


.0046 


.0120 


.0094 


.0070 


.0104 


026*5 


os on 


."^4 


1808 


9197 
• ^ XZ / 


.9317 


8710 


761Q 
• / Pjo 


.0038 


0084 


0076 


.0122 


.0099 


.0083 


.0586 


.0917 


.1230 


.3462 


.3317 




.4393 


252A 


. XODZ 


.0136 


0115 


OOQ5 

• UU7J 


.0213 


.0156 


on s 

. UXX J 


.1931 


• 2185 


21°0 

. ^ J u 


.4726 


. 3708 


• £7 JO 


-.3695 


- 1882 


• JO 


.0128 


. 0103 


.0090 


.0202 


.0148 


.0110 


.3932 


3750 

• J / J \J 


• j*- 


.4557 


3788 


n 70 

• J X / 7 


.0268 


1339 

• X w J J 


AAO 


.CQ69 






. 0126 


00QS 
« \j \j y j 


• uu / 0 


.5341 


A709 


AO A 4 


.6081 


AQ19 


AOS! 


-.4094 


. OA7? 




.0019 


0027 


0019 
• UU J L 


.0107 


.0089 


0067 


.4677 


.4182 


.3617 


.4802 


.4463 


.3967 


-.4165 ■ 


-.2916 - 


.0601 


.0002 


.0005 


.0009 


.0027 


. 004z 


.0046 


.2315 


.2224 


. 2043 


.3288 


.3047 


.2812 


. 3239 


.2564 


. 2043 



.700 .300 .900 



0.9 



1.8 



2.7 



3.6 



4.5 



5.4 



6.3 



7.2 



8.1 



.0220 
.0024 
.4467 
1.0d88 
- . 3446 
.0016 
.0435 
.7620 
1.0340 
-.9503 

.0000 
.0004 
.5532 
.4318 
-.2196 
.0000 
.0000 
.2233 
.3515 
.9990 



.0179 
.0441 
.4327 
. 7237 

-.2303 
.0046 
.0328 
.6781 
.9880 

-.3953 

.0002 
.0063 
.5374 
.4563 
-.3339 
.0000 
.0001 
.2319 
.3529 
.3632 



.0151 
.0287 
.4169 
.5614 

-.1100 
.0063 
.0191 
.6020 
.7677 

-.7189 

.0009 
.0107 
.5086 
.4851 
-.4381 
.0000 
.0009 
.2336 
.3472 
.3626 



.0008 
.0035 
.0082 
.0677 
.9746 
.0024 
.0049 
.0M4 
.1434 
.8961 

.0045 
.0053 
.0773 
.2097 
.6111 
.0065 
.0067 
.1474 
.2353 
.2351 

.0076 
.0084 
.2318 
.2315 
.2898 

.0073 
.0081 
.3065 
.2665 
.5678 
. 0056 
.0062 
.3383 
.3319 
.5634 

.0033 
.0049 
.2977 
.3392 
.2874 
.0013 
.0039 
.1758 
.2565 
.2534 



.0011 
.0027 
.0196 
.0968 
.9161 
.0025 
.0034 
.0534 
.1463 
.7567 

.0039 
.0042 
.1008 
.1697 
.5129 
.0051 
.0052 
.1570 
.1726 
.4402 

.0056 
.0060 
.2120 

.r; \ 

.0054 
.0058 
.2517 
.2207 
.7776 
.00A5 
.0049 
.2602 
.2652 
.7985 

.0030 
.0038 
.2237 
.2750 
.6845 
.0014 
.0028 
.1351 
.2231 
.5227 



.0010 
.0019 
.0310 
.0783 
.7621 
.0019 
.0026 
.0655 
.0991 
.7222 

.0027 
.0032 
.1004 
. 1109 
.7588 
.0032 
.0037 
.1320 
.1253 
.3437 

.0034 
.0039 
.1556 
.1476 
.9134 

.0032 
.C039 
.1666 
.1737 
.9471 
.0028 
.0035 
.1602 
.1939 
.9550 

.0021 
.0029 
.1321 
.1964 
.9437 
.0011 
.0021 
.0793 
.1654 
.9118 
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Table of the Fal_ 2 Positive Error and its 
S.E.*SQRT(M) , the False Negative Error and its 
S.E.*SQRT(M) , and the Correlation between FP and f\ 
Number of Items: 9, Theta Zero: .70, Mastery Score: 8 



Test KR21- 



Moan 


100 


200 


300 


400 


500 


600 


700 

. / \J\J 


. 800 


.900 


0.9 


.0000 


.0000 


.0000 


.0001 


.0003 


.0007 


.0016 


.0027 


.0030 




,0000 


.0001 


.0004 


.0012 


.0032 


.0064 


.0093 


.0091 


.0059 




0000 


0000 

% \J\J Kt\J 


0000 


0000 


0000 


0004 


0023 

. \J \J Cm *J 


0080 

. \J \J \y \J 


.0169 




0000 


C-)00 

» \> t\J\J 


0000 


0000 


0010 

1 vvlv 


0071 


0253 

. \J Cm J J 


0517 


0515 

. W X mm* 




9808 


9797 


9799 


9903 


9925 


991Q 


9866 


9622 

% J *J Cm Cm 


8268 

. W Cm \J \J 


1 3 


. 0001 


.0002 


0004 


.0009 


. 0019 


.0035 


0053 

• \J \J mJ *mt 


.0066 


.0060 




.0008 


.0019 


.0040 


. 0078 


.0126 


0159 


0157 


.0119 


.0030 




• 0000 


.0000 


.0000 


.0000 


. 0006 


.0030 


.0101 


.0236 


.0369 




.0000 


.0000 


.0000 


.0014 


.0098 


.0323 


.0654 


.0875 


. 0656 




.9594 


.9537 


.9763 


.9819 


.9825 


.9770 


.9585 


.8921 


.7429 


2.7 


.0008 


.0015 


.0027 


.0046 


.0071 


.0095 


.0112 


.0113 


.0086 




.0075 


.0122 


.0188 


.0247 


.0265 


.0236 


.C182 


.0129 


.0098 




.0000 


.0000 


.0001 


. 0009 


0044 


.0133 


.0289 


.0484 


.0586 




. 0000 


. 0000 


.0022 


. 0168 


. 0493 


0890 


!ll66 


.1123 


.0709 




9336 
. j *j +j \j 


9482 

. * T 'mf Cm 


9637 


9650 


9550 


9269 


* 8589 

. \J nmf \J m* 


712b 

. / X Cm W 


.7127 


3 6 


0056 


0083 


.0117 


. 0152 


0176 


HI 0-7 

• V' X 0 / 


0182 

• \J X W Cm 


.0157 


.0107 




.0339 


.0442 


!0478 


.0419 


.0326 


.0245 


.0189 


'.0150 


.0116 




.0000 


.0001 


.0017 


.0083 


.0217 


. 0412 


.0634 


.0817 


.0797 




.0000 


.0044 


.0388 


.0964 


. 1448 


.1663 


.1568 


.1192 


.0753 




.8897 


. 9319 


. 92G9 


.8961 


8218 

. W Cm X V 


f934 


.5360 


.4696 


.7797 


4 5 


0249 


.0303 


0331 


0333 


0315 

. \J -J mm, -m* 


0^84 


0242 


.0189 


.0118 




0982 

. \J m* \J Cm 


0798 


0537 


0410 


0345 


0293 

. Vm/ Cm m* <mf 


0241 

. \J Cm*T X 


0137 


.0131 




0001 


0049 


0213 
. \j c» x *j 


0457 




0065 


1143 

• X X M -J 


1191 

• X X J ^ 


.0971 




0079 


1307 


2409 

* Cm "TV/ ✓ 


2719 

% Cm 1 mm, J 


2543 


2135 

• Cm X J .J 


1633 


1148 

. X X *T \J 


0864 




374] 


. X . • 


5970 


3131 

. <mt X >mt X 


1364 


0995 


1951 

• X m* -m* X 


46S8 

. *t \J \J yJ 


. 8826 


5.4 


.0745 


.0655 


.0560 


.0476 


.0401 


.0332 


.0266 


.0197 


.0118 




.1261 


. 128G 


.0977 


.0722 


.0536 


.0399 


.0295 


.0212 


.0139 




0224 


0781 


1215 


1510 


1690 


. X / O £ 


1719 

• X / X v 


1523 

• Cm Jf 


1074 




7446 


6761 


4848 


3518 


2674 

. Cm\J £**T 


9004 


1566 

. X mJ \J \J 


. X Cm \J 1 


1054 




- /«627 

. * » V.' Cm i 


- 6318 


- 5847 


- 4181 


- 1988 


0728 

* \J i Cm*J 


3914 

. +J m* X *T 


7163 

. / m\.\J ~mt 


.9449 

. m* "T~T m* 


f 3 


.0667 


0558 


04/8 


0411 


0351 


0295 


0238 


0177 


.0106 




.2292 


. 1297 


0877 


. 0637 


0478 


0363 

. V/ *J \J *mj 


0275 


. 0203 


.0136 




3639 


. 3478 


. 3266 


* 3040 


2786 

. *- / wis 


2492 


. 2139 


! 1690 


.1066 




. 8937 


-.5817 


'. 4396 


, 3522 


. 2°03 


2423 


. 202i 


' 1652 

. X \J *J Cm 


.1252 




-.2844 


- . 1283 


. 0270 


! 1831 


3561 


5289 

. nm* CmKi J 


.' 7000 


. 8556 


9675 

. mf \J 1 <m* 




. 00.')0 


,0109 


0161 


.0186 


.0192 


.0184 


.0164 


.0131 


.0081 




.1090 


.0954 


.0615 


. 0415 


.0307 


.0243 


.0199 


.0162 


.0120 




.5379 


.4764 


.4159 


. 3611 


. 3100 


.2605 


.2100 


.1553 


.0907 




.7749 


.7808 


.6231 


.4919 


.3927 


. 3154 


.2517 


.1946 


.1350 




-.0299 


- . ;017 


-.6252 


-.3353 


.0172 


.3581 


. 6418 


.8497 


.9699 


8.1 


.0000 


.0002 


.0013 


.0031 


.0051 


.0065 


.0072 


.0066 


.0045 




.0001 


.0077 


.0200 


, 3253 


.0240 


.0194 


,0145 


.0111 


.0089 




.2288 


.2308 


.2264 


. 2136 


.1936 


.1677 


.1365 


.0999 


.0561 




.3513 


.3396 


.3.141 


.2935 


.2724 


.2463 


.2140 


.1740 


.1201 




. 3444 


.269? 


.1241 


.0098 


-.0005 


1116 


.1,720 


.7312 


.9584 
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Table of the False Positive Error and its 
c t S, *coS2/\ T ,n (M) ' ?ie Jalse negative Error and its 
S,L.*snp Vi (M), and ^he Correlation between FP and FII 
Number of Items: 9, Theta Zero: .80, Mastery Score: C 



Test KR21 
Mean 



1 ftft 

. 100 


ft ft /\ 

.200 


. 300 


.400 


,500 


.0000 


.0000 


.0000 


. 0001 




.0000 


.0001 


.0004 


.0012 


.0035 


- 0000 


.0000 


.0000 


.0000 


.0000 


0000 


.0000 


.0000 


.0000 


.0000 


ft ^ £ ft 

. 9768 


.9747 


.9917 


.9739 


.9900 


.0 on l 


.0002 


.0004 


.0009 


.0021 


• 0003 


.0019 


.0040 


.0083 


.0159 


. 0000 


.0000 


.0000 


.0000 


.0000 


ft ft ftft 

.0000 


. 0000 


.0000 


.0000 


.0006 


ft / ~? ft 

. 9479 


.9300 


.9246 


.9769 


.9811 


.0000 


.0015 


.0027 


.0049 


.0036 


.0075 


.0122 


.0195 


.0304 


.0433 


ft ft ft r\ 

• 0000 


ft ft ft ft 

,0000 


.0000 


.0000 


.0003 


ft ft f\f\ 

. 0000 


ft ft ft ft 

.0000 


.0000 


.0006 


.050 


ft / C £ 

. 9456 


ft / r\ ft 

9403 


.9534 


.9632 


.9668 


. 0056 


.0083 


.0123 


.0180 


.0252 


.0339 


.0455 


.0603 


. 0741 


0786 


.0000 


.0000 


.0000 


.0003 


.0 1 


ft ft ft ft 

. 0000 


.0100 


.0007 


.0073 


.0252 


O ft ft o 

. 5903 


.95?7 


.9338 


.9401 


.9325 


ftft / ft 
. 0249 


.0319 


.0405 


.0493 


.0562 


. 1001 


.1179 


. 1263 


.1147 


.0922 


ft ft ^ ft 
. ^000 


.0000 


.0006 


.0037 


.0110 


ftft ftft 

. 0000 


.0011 


.01^8 


.0478 


.0790 


.9990 


.8353 


.G900 


.3609 


.7859 


.0313 


.0926 


.0992 


. 0996 


i U74U 


.2148 


.1910 


.1396 


.1101 


.0956 


. 0000 


.0017 


.0098 


.0234 


.0392 


.0015 


.0546 


.1258 


.1538 


.1459 


. 8030 


.7579 


.5528 


.2613 


.C"»8 


. 2020 


.1823 


. 1584 


.1360 


.11^5 


ft/* ft r\ 

. 2639 


.3102 


.2547 


.1970 


.1513 


. 0093 


.0410 


.0680 


.0861 


.0963 


O *> ft ft 

. 3709 


.4059 


.2912 


.2005 


.1405 


. 4041 - 


.7205 - 


.6496 - 


.4835 


-.2238 


1 6 1 


1 ^oi 


. 1214 


1 A C ft 

. 1059 


. 0914 


.5146 


.3091 


.2187 


.1649 


.1282 


.2371 


.2139 


.1934 


.1737 


.1536 


.6120 


.3872 


.2371 


.2267 


.1844 


.1457 


.0468 


,2196 


.3818 


.5350 


.0023 


.0140 


.0252 


.0324 


.0358 


.1176 


.1821 


.1447 


.1053 


.0779 


.2252 


.2062 


. 1804 


.1546 


.1299 


.3384 


.3698 


.3270 


.2703 


.2198 


.2091 - 


.5335 - 


.4530 - 


.2443 


.0534 



.600 ,700 .800 .900 



0.9 .0000 .0000 .0000 .0001 .0003 .0009 .0024 .0052 .0074 

... oogg qi76 Q236 qi72 

.0000 .0003 .0021 .0067 

.0007 .0052 .0181 .0262 

i ■> n ,„, A/wwt 9916 .9902 .9798 .8945 

,0002 - 00CA - 0009 - 0021 - 0M5 -0087 .0133 .0155 

.0268 .0359 .0345 .0219 

.0003 .0018 .0068 .0153 

.0046 ,0170 .0354 .0348 

.9821 .9768 .9510 .3105 

!.7 .0000 .0015 .0027 .0049 .0036 .0141 .0205 .0254 .0233 

.0523 .0510 .0386 .0254 

.0017 .'J063 .0156 .0254 

0181 .0337 .0525 .0378 

- , nn „ 9625 .9428 .8741 .7105 

3,6 - 0 ? 83 - 0123 - 0180 -°252 .0326 .0379 .0385 .0300 

.0710 .0553 .0393 .0299 

.0072 .0167 .0292 .C361 

.0494 .0663 .0636 .0375 

.9038 .8321 .6765 .6831 

''* 5 '?? 19 - 0A05 - 0 * 93 . -0562 .0594 .0579 .0505 .0345 

0707 .0552 .0448 .0355 

0224 .0359 .0471 .0461 

0941 .0880 .0637 .0387 

6505 .4790 .4122 .7911 

5,4 "? 926 - 0992 - 0956 - 09 * 8 - 08 * 0 -0737 .0577 .0361 

.0837 .0708 .0565 .0406 

.0539 .0644 .0667 .0532 

.1192 .0864 .0581 ..)466 

6.3 .2020 .1823 .1584 .1365 ! h.i $8 [IVlO '.till !o338 

.1158 .G877 .0645 .0431 

.0992 .0948 .0816 .0552 

.1034 ,0821 .0702 .0598 

.1260 .5064 .8092 .9677 

7,2 'I*} 6 , 'III} ' 121A - 1059 -° 9 ^ -0771 .0623 .0462 .0270 

.1009 .0791 .0602 .0411 

.1324 .1092 .0324 .0490 

.1516 .1241 .0987 .0708 

ft i "'nnoo "nwn ' Z1 ™ - 3818 - 5350 - 6780 -8068 .9126 .9813 

8.1 .0023 .0140 .0252 rm* n^ fl . 0356 <0323 ^ 

.0608 .0506 .0429 .0324 

.1060 .0824 .0580 .0316 

.1771 .1400 .1055 .0686 

.3925 .6916 .3885 .9806 
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Table of the False Positive Error and its 
S.E.*SQRT(H), the False Negative Err.r and its 
S.K.*SQRT(M) , and the Correlation between FP and FN 
Number of Items: 10, Theta Zero: .60, Mastery Score: 6 

Test KP.21« 
I'ean .100 



.200 ,300 .400 .500 .600 .700 .800 .900 



1.0 



2.0 



3.0 



4.0 



5.0 



6.0 



7.0 



9.0 



.0004 
.0053 
.0000 
.0001 
.9494 
.0095 
.0585 
.0000 
.0001 
.8372 

.0566 
.1325 
.0000 
.0000 
.9026 
.1784 
.3181 
.0000 
.0026 
.6275 

.3643 
.6066 
. 0090 
.2868 
-.7948 

.2603 
.7163 
.1381 
.4124 

■ . 1455 
.0134 
.4589 
.1523 
.2589 

-.5209 

.0000 
.0015 
.0407 
.1539 
.8419 
.0000 
. 0000 
.0031 
.0290 
. 9346 



.0010 
.0111 
.0000 
.0000 
.9449 
.0139 
.0755 
.0000 
.0000 
.8992 

.0674 
.1994 
.0000 
.0011 
. 8234 
. 1883 
.2673 
.0017 
.0447 
.4821 

.3170 
.5753 
.0290 
.2234 
-.7867 

. 2335 
. 4484 
.1195 
. 2454 
.0183 
.0475 
.4162 
.1338 
.2530 
-.6531 

.0016 
. 0508 
.0490 
.1401 
.8060 
.0000 
.0003 
.0055 
.0424 
.9435 



.0021 
.0197 
.0000 
.0000 
.9521 
.0199 
.0952 
.0000 
. 000* 
.9217 

.0794 
.1987 
.0005 
.0111 
.8222 
. 1886 
.2195 
.0072 
.0791 
.1798 

.2724 
.4330 
.C .22 
.1450 
-.6245 

.2106 
.3267 
.1049 
.1720 
.1650 
.0725 
.2889 
.1146 
.1969 
-.5067 

.0086 
.1124 
.0541 
.1046 
.6662 
.0002 
.0066 
.0088 
.0550 
.9530 



.0041 
.0327 
.0000 
.0002 
.9713 
.0279 
.1117 
.0002 
.0038 
.9273 

.0898 
.1742 
.0024 
.0282 
.7769 
.1799 
.1957 
.0148 
.0833 
-.0055 

.2337 
.3235 
.0491 
.0968 
-.3606 

.1891 
.2518 
.0921 
.1233 
.3055 
.0874 
.2053 
.0976 
.1468 
-.2767 

.0197 
.1361 
.0550 
.0810 
.4768 
.0015 
.°252 
.0128 
.0583 
.9439 
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0076 
0488 
0001 
0016 
9733 
0371 
1154 
0010 
0121 
9192 

0960 
1423 
0061 
0412 
6888 
1650 
1736 
0222 
0717 
0269 

1990 
2446 
0515 
0704 
0333 

1677 
1936 
0800 
0983 
4440 
0943 
1535 
0324 
1101 
0016 

0318 
l'i03 
0526 
0673 
3450 
0049 
0430 
0165 
0521 
9153 



.0126 
.0613 
.0005 
.0062 
.9693 
.0456 
.1039 
.0031 
.0223 
.8910 

.0964 
.1149 
.0113 
.0448 
.5768 
.1457 
.1484 
.0279 
.0557 
.0796 

.1667 
.1858 
.0504 
.0543 
.2941 

.1455 
.1575 
.0682 
.0759 
.5824 
.0944 
.1196 
.0684 
.0829 
.2859 

.0421 
.1099 
.0476 
.1570 
.3232 
.0103 
.0629 
. 0138 
. 0418 
. 8532 



0183 
0621 
0017 
0133 
9520 
0508 
0830 
0066 
0283 
8276 

0901 
0939 
0164 
0392 
4961 
1225 
1214 
0308 
0406 
3046 

1349 

139°- 

04b 

0441 

5772 

1215 
1234 
0560 
0580 
7206 
0881 
0953 
0547 
0619 
5594 

0484 
0846 
0406 
0477 
4265 
0166 
0630 
0190 
0323 
7461 



.0221 
.0490 
.0038 
.0175 
.8922 
.0494 
.0621 
.0105 
.0254 
.7231 

.0758 
.0766 
.0196 
.0281 
.5553 
.0948 
.0939 
.0297 
.0292 
.6317 

.1014 
.1018 
.0383 
.0350 
.8017 

.0939 
.0935 
.0427 
.0429 
.8533 
.0743 
.0760 
.0406 
.0450 
.7947 

.0477 
.067 0 
.0313 
.0382 
.6678 
.0210 
.0491 
.0166 
.0261 
.6845 



.0191 
.0345 
.0055 
.0124 
.8148 
.0364 
.0478 
.0115 
.0159 
.8151 

.0502 
.0582 
.017? 
.0135 
.8684 
.0590 
.0652 
.0222 
.0220 
.9244 

.0619 
.0676 
. 0254 
.0259 
.9543 

.0586 
.0647 
. 0263 
. 0290 
.9624 
. 0496 
.0j72 
.0242 
.0297 
.9550 

.0357 
.0468 
.0190 
.0272 
.9315 
.0136 
.0337 
.0108 
.0206 
.3948 
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Table of the False Positive Frror and its 
S.E.*SQRT(M), the False Negative Error and its 
S.E.*SQRT(M), and the correlation between FP and FN 
Number of I terns: 10, Theta Zero: .60, Mastery Score: 7 



Test KR21= 
Mean . 100 



.200 .300 .400 .500 .600 .700 .800 .900 



1.0 



2.0 



3.0 



4.0 



5,0 



6.0 



7.0 



8.0 



9.0 



.0000 
.0008 
.0000 
.0001 
.9655 
.0016 
.0150 
.0000 
.0000 
220 



.0146 
.0748 
.0000 
.0000 
.8875 
.0645 
.1908 
.0000 
.0045 
.8046 

.1727 
.2947 
.0159 
.5174 
-.6312 

.1427 
.4359 
.2583 
.6842 

-.1720 
.0077 
.2598 
.3392 
.5525 

-.7416 

.0000 
.0010 
.1318 
.2822 
.6712 
.0000 
.0000 
.0181 
.1001 
.G783 



.0001 
.0021 
.0000 
.0000 
.9620 
.0030 
.0234 
.0000 
.0000 
.^339 

. ^200 
.0913 
. 0000 
.0020 
.9062 
.0737 
.1572 
.0030 
.0811 
.7277 

.1507 
.2892 
.0535 
.4430 
-.7145 

.1232 
.2588 
.2333 
.4297 

-.0063 
.0264 
.2209 
.2966 
.5370 

-.7466 

.0009 
.0286 
.1417 
.2531 
.5844 
.0000 
.0002 
.0246 
.1198 
.8915 



.0004 
.0050 
.0000 
.0000 
.9717 
.0052 
.0349 
.0000 
.00C7 
.9574 

.0267 
.0975 
.0008 
.0203 
.9065 
.0778 
.1138 
.0132 
.1502 
.4878 

.1288 
.2170 
.0804 
.3002 
.5757 

.1080 
.1809 
.2115 
.3144 
.1437 
.0390 
.1448 
.2553 
.4103 
.5752 • 

.0048 
.0609 
.1449 
.2059 
.3805 
.0001 
.0037 
.0324 
.1341 
.8981 



.0010 
.0104 
.0000 
.0003 
.9840 
.0086 
.0464 
.0003 
.0070 
.9607 

.0332 
.0874 
.0044 
.0536 
.8764 
.0766 
.0921 
.0281 
.1658 
.2445 

.1097 
.1602 
.0967 
.2093 
-.3547 

.0947 
.1347 
.1907 
.2437 
.2883 
.0457 
.0991 
.2190 
.3099 
-.3083 

.0106 
.0706 
.1406 
.1768 
.2030 
.0008 
.0136 
.0404 
.1316 
.8838 



.0023 
.0183 
.0001 
.0030 
.9852 
.0130 
.0517 
.0019 
.0233 
.9543 

.0380 
.0703 
.0118 
.0817 
.8103 
.0718 
.0790 
.0435 
.1500 
.1351 

.0928 
.1193 
.1044 
.1539 
-.0772 

.0823 
.1030 
.16S7 
.1935 
.4313 
.0480 
.0731 
.1862 
.2363 
.0083 

.0167 
.0648 
.1303 
.1553 
.1442 
.0026 
.0250 
.0467 
.1144 
.8396 



.0044 
.0257 
.0009 
.0120 
.9820 
.0175 
.0483 
.0060 
.0446 
.9332 

.0400 
.0545 
.0223 
.0930 
.7061 
.0643 
.0669 
.0564 
.1216 
.1608 

.0772 
.0892 
.1051 
.1186 
.2253 

.0701 
.0794 
.1478 
.1544 
.374 1 
.0469 
.0572 
.1555 
.1810 
.3203 

.0215 
.0526 
.1156 
.1344 
.2058 
.0054 
.0315 
.0496 
.0928 
.7555 



.0071 
.0278 
.0033 
.0273 
.9693 
.U208 
.0383 
.0133 
.0595 
.8821 

.0386 
.0426 
.0337 
.0854 
.5935 
.0546 
.0547 
.0644 
.0911 
.3112 

.0621 
.0661 
.0988 
.0942 
.5138 

.0576 
.0606 
.1238 
.1219 
.7160 
.0428 
.0459 
.1253 
.1376 
.5937 

.0240 
.0395 
.0970 
.1128 
.3879 
.0084 
.0302 
.0478 
.0750 
.6462 



.0092 
.0224 
.0079 
.0380 
.9214 
.0211 
.0281 
.0220 
.0565 
.7755 

.0333 
.0339 
.0417 
.0635 
.5782 
.0426 
.0423 
.0642 
.0650 
.5943 

.0465 
.0474 
.0844 
.0751 
.7723 

.0438 
.0447 
.0960 
.0930 
.8520 
.0352 
.0365 
.0934 
.1015 
.8133 

.0230 
.0290 
.0741 
.0901 
.6812 
.0102 
.0228 
.0403 
.0628 
. 6439 



.0034 
.0152 
.0120 
.0283 
.8212 
.0161 
.0211 
.0253 
.0358 
.8032 

.0224 
.0259 
.0385 
.0411 
.8481 
.0266 
.0294 
.0498 
.0483 
.9107 

.0282 
.0310 
.0576 
.0572 
.9489 

.0270 
.0301 
.0601 
.0649 
.9622 
.0230 
.0270 
.0560 
.0680 
.9586 

.0167 
.0221 
.0445 
.0637 
.9389 
.0088 
.0159 
.0256 
.0493 
.9031 
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Table of the False Positive Error and its 
S.E.*SQRT(M), the False Negative Error and its 
S.E.*SQRT(M), and the Correlation between FP and FN 
Number of Items: 10, Theta Zero: .60, hastery Score: 8 



Test KR21- 



I lean 


.100 


.200 


.300 


.400 


.500 


.600 


.700 


.800 


.900 


1. 


0 


.0000 


.0000 


.0001 


.0002 


.0005 


.0012 


.0021 


.0029 


.0028 






. 0001 


.0003 


. 0009 


.0025 


.0052 


.0081 


.0092 


.0076 


.0050 






.0000 


.0000 


.0000 


.0000 


.0002 


.0013 


.0052 


.0132 


.0215 






.0001 


.0000 


.0000 


.0004 


.0044 


.0186 


.0445 


.0667 


,0536 






.9751 


.9725 


.9831 


.9907 


.9912 


. 9884 


.9780 


.9360 


.8101 


2. 


0 


.0002 


.0005 


.0010 


.00^0 


.0035 


.0051 


.0065 


0068 


.0054 






.0025 


.0050 


.0092 


.0140 


,0169 


.0165 


.0134 


.0096 


.0070 






.0000 


.0000 


.0000 


.0005 


.0028 


.0093 


.0214 


.0373 


.0458 






.0000 


.0000 


.0011 


.0103 


.0354 


.0710 


.1005 


.1026 


.0677 






.9459 


.9559 


.9754 


.9773 


.9718 


.9549 


.9113 


.8027 


.7683 


3. 


0 


.0027 


.0043 


.0067 


.0093 


.0114 


.0125 


.0124 


.0110 


.0075 






. OZQQ 


AO O C 

. 02o5 


. 0335 


.0313 


. 0253 


.0192 


.0145 


.0113 


.0086 






.0000 


.0000 


.0012 


.0065 


.0179 


.0351 


.0552 


.0719 


.0704 






.0000 


.0028 


.0295 


.0805 


.1284 


.1541 


.1508 


.1183 


.0760 


4. 




.9131 


.9470 


.9473 


.9261 


.8758 


.7839 


.6553 


.5751 


.8008 


0 


.0165 


.0210 


.0226 


.0242 


.0232 


.0211 


.0181 


.0143 


.0090 






.0762 


.0651 


.0443 


.0328 


.0263 


.0224 


.0183 


.0142 


.0099 






AAA 1 

. 0001 


. 0042 


. 0193 


.0423 


.0677 


.0911 


.1082 


.1130 


.0923 






.0062 


.1165 


.2252 


.2612 


.2497 


.2142 


. 1679 


. 1206 


.0871 






.8923 


.8513 


.6873 


.4490 


.2748 


.2220 


.2925 


.5167 


.8756 


5. 


0 


.0587 


,0513 


.0443 


.0377 


.03xJ 


.0264 


.0212 


.0158 


.0096 






.1034 


.1013 


.0771 


.0563 


.0420 


.0312 


.0229 


.0263 


.0106 






.0223 


.0779 


.1212 


.1507 


. 1685 


.1756 


.1713 


.1522 


. 1083 






.7386 


.6762 


.4903 


.3614 


.2745 


.2129 


. lo69 


.1318 


.1032 






-.3679 


-.6326 


-.5306 


-.3579 


-.1414 


.1122 


. 3981 


.6951 


.9310 


0. 


0 


.0562 


.0469 


.0401 


.0345 


.0295 


.0248 


.0201 


.0151 


.0092 






. 1933 


. 1081 


.0724 


.0521 


.0387 


.0291 


.0217 


.0157 


.0104 






.3810 


.3591 


.3373 


.3141 


.2884 


.2587 


.2231 


1780 


1148 






.9402 


.6141 


.4641 


.3707 


.3034 


.2501 


.2047 


1629 


. 1197 


7. 




-.2712 


-.1257 


.0171 


.1652 


. 3209 


. 4846 


. 6537 


.8194 


9543 


0 


.0032 


.0105 


.0151 


.0173 


.0178 


.0170 


.0152 


.0123 


.0079 






.1060 


.0844 


.0525 


.0350 


.0258 


.0203 


.0163 


.0129 


.0094 






.5877 


.5191 


.4538 


.3955 


.3417 


.2897 


.236/ 


.1790 


.1088 






.8574 


.8293 


.6457 


.5014 


.3945 


.3123 


.2454 


,1869 


. 1295 






-.8737 


-.8331 


-.6587 


-.3704 


-.0252 


.3032 


.5820 


.8048 


.9555 


8. 


0 


.0000 


.0004 


.0019 


.0041 


.0063 


.0080 


.0087 


.0081 


.0058 






.0004 


.0116 


.0238 


.0265 


.0235 


1*5 


.0136 


.0101 


.0077 






.3251 


.3254 


.3161 


.2964 


.2686 


.2346 


.1946 


.1 73 


.0880 






.3852 


.3673 


.3461 


.3267 


.2984 


.2621 


.2214 


.1774 


.1255 






.3049 


.1471 


-.0439 


-.1278 


-.0891 


.0580 


.3191 


.6683 


.9380 


9.0 


.0000 


.0000 


.0000 


.0003 


.0010 


.0020 


.0031 


.0037 


.0031 






.0000 


.0001 


.0015 


.0053 


.0095 


.0115 


.0107 


.0078 


.0055 






.079? 


.0903 


.1000 


.1099 


.1143 


.1132 


.1034 


.0837 


.0515 






.2296 


. 2423 


.2471 


.2328 


.2044 


.1744 


.1504 


.1301 


.1004 






.9990 


.7549 


.7425 


.7206 


.6541 


.5540 


.4777 


.5722 


.9003 
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Table of the False Positive Error and its 
e ?'!a*?S!1 (M) » the Jalse Negative Error and its 
„ • r ' S QRT (M; , and the Correlation between FP and FN 
Number of Items: 10, Theta Zero: .70, Mastery Score: 7 

Test KR21- 

-f !?-„.,:"!! :l-.„.:l 00 A0 ° - 50 ° - 600 .700 .800 .900 

.0008 .0021 .0050 .0106 .0208 .0357 0490 0485 rm? 

•o°Sn°? -n°Sn°S "2222 '2222 ' 000 ° 0QQl -0005 ioSlS 1 0041 

"S2Si -22?? -2222 • 000 ° - 0002 - 0015 -ooeo .0125 ons 

-> 0 'ooifi 'onto *no?5 -2252 - 9832 - 9783 - 9546 - 8468 

" '0150 ML -22H '22?? -SIK '° 226 - 0315 - 0377 -0334 

• ' 023Z * .0355 .0522 .071C .0837 0815 nfill rum 

.0000 .0000 .0000 .0000 0001 0007 0025 0055 008° 

'9283° - 9 °2?2 -8SSS -2S8 -SS22 0079 0l " : ° 21 * 

.9283 .9238 .9350 .9558 .9599 .9562 .9375 .8760 .7800 

3,0 '§748 'awl '??£! -?J!S - 0584 - 0650 - 0637 -0^2 

.0748 .0926 .1131 .1299 .1323 .1180 0936 fifiQ? ns?fi 

.0000 .0000 .0000 .0002 0011 0034 0075 0123 *0141 

M98 '§?42° -So 0 ?; -o 0 ?!2 -° 127 Q ™ ^"o IfllM 

/ « •5?? 8 ' 91/ * 2 -9072 .9146 .9062 .8753 8069 6997 7711 

- »3 IIS? 182 •«& -J58 -1°' 6 : "» : ' 

:o l So J o J .1J§o 7 : 2 : : -SHI -g?|? -gJJ -gffi 

mm -ana -S?l! -US :<S« :«94 :8i?S :8i?i 

.9990 .8172 .8194 .7758 .6844 .5597 .4586 .5010 .8452 

5,0 '3252 "a?*? 'llll '}US - 1576 - 1334 -1° 38 -0651 

.JZ5£ .2791 .2249 .1990 .1784 1545 1978 noo* n*a«; 

.0000 .0013 .0062 .0135 ionO .0272 iJS! .MOl !o229 



•Sn! -Kf! *2J^ - 0799 - 0702 -° 5 ^ .0392 :6276 0211 

.6504 5346 .2405 .0139 -.0486 .0233 .2451 !5959 .921* 

6 "° •Si? iii? -iff? -5532 - 2089 - 1750 .1062 .oo46 

.5491 .5717 .4442 .3370 .2569 1962 1481 ma* rvii* 

•S22? -2?L 6 - 0389 '° 463 0493 !0487 '.Ull ."0375 ' Olll 

-'7472 -5522 '° 666 - 0512 .0«* 0335 .0254 

"-ZJSS "-S?5! - • 5156 -- 0868 - 2649 .5734 .soss .9534 



7.0 .2651 .2370 .2132 .1908 .*1687 .*1459 1212 0910 ns7? 

' 318° U39° 'n 3 ^ -ft?? - 2076 :}30 ' :0994 \olll 

.1318 .1139 .0999 .0875 .0759 .0645 .0527 .0393 .0242 



-*1218 "a/fif -KS *i??2 : ° 735 - 0566 - 0423 -0288 

-.1218 .0463 .1952 .3365 .47*3 .6115 .7458 .8714 .9685 



'3397 "3802 4ll ' fSJ -??Jf - 0789 - 0661 -°* 32 

1262 li!f "nlln 'lata ' IMl ' 1201 - 0961 - 0778 -0594 

*2037 "2106 *??n2 -?252 - 0592 - 0471 - 0344 -o 200 

.2037 .2106 .1709 .1319 .1013 .0778 .0592 .0430 .0291 



n n -.1346 -.5443 -.4412 - 2348 6258 3098 5864 aiS oma 

9,0 -222? -2523 - 0037 - 0105 o* 9 * iwfo .'0322 '0321 '0235 

.0001 .0191 .0648 .0972 .1053 0950 0752 niii n7?5 

loo 8 -i°o 2 ^ -82 -n 03 ^ 6 : „°g; : P 7 : ° 2 " $K ; 0 °J 

396? ' '2?2? 'Son? * 0547 - 0459 - 0395 -0331 .0239 

.3965 .8897 .8391 .7291 .5833 .4757 .4962 .7028 .9447 
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Table of the False Positive Error and its 
S.E.*SQRT(M), the False NegatJve Error and its 
S.E,*SQRT(M), and the Correlation between FP and FN 
Number of Items: 10, Theta Zero: .70, Mastery Score: 8 



Test KR21- 

Mean .100 .200 .300 .400 .500 .600 .700 .800 .900 



1.0 



2.0 



3.0 



4.0 



6.0 



7.0 



8.0 



9.1 



0 

ERIC 



r\ r\ r\ r\ 

• 0000 


AAAA 

.0000 


.0001 


.0002 


,0006 


.0016 


.0035 


.0060 


.0071 


,0001 


A A i\ O 

. 0003 


.0009 


.0026 


.0064 


.0131 


.0201 


.0213 


.0141 


A AAA 

• 0000 


.0000 


.0000 


.0000 


.0000 


.0002 


.0011 


.0042 


.0095 


A A A *1 

.0001 


.0000 


.0000 


.0000 


.0003 


.0030 


.0127 


.0284 


.0292 


. 9764 


.9745 


.9726 


.9858 


.9900 


.9903 


.9864 


.9681 


.8633 


• 0002 


.0005 


.0010 


.0021 


.0043 


.0077 


.0119 


.0153 


.0143 


A A O C 

• 0025 


A A C A 

. 0050 


.0094 


.0169 


.0267 


.0348 


.0359 


.0283 


.0186 


AAAA 

• 0000 


AAAA 

.0000 


.0000 


.0000 


.0002 


. 0014 


.0053 


0131 


.0210 


. 0000 


AAAA 

. 0000 


.0000 


.0005 


.0043 


.0162 


.0357 


.0497 


.0372 


.9508 


.9472 


.9634 


.9749 


.9774 


.9740 


9594 


9084 


.7845 


• 0027 


.0043 


A A T *1 

.0071 


.0111 


.0164 


.0218 


.0259 


.0266 


.0208 


AAAA 

• 0200 


AAAA 

.0292 


.0414 


.0534 


.0584 


.0540 


.0429 


.0307 


.0226 


AAAA 

• 0000 


.0000 


.0000 


.0004 


.0021 


.0071 


.0161 


.0277 


.0336 


AAAA 

. 0000 


A A A i % 

.0000 


.0008 


.0076 


.0256 


.0498 


.0674 


.0648 


.0401 


.9189 


.9273 


.9484 


.9532 


.9465 


.9231 


8671 


7490 


7511 


.0165 


.0221 


.0291 


.0362 


.0417 


.0442 


.0431 


.0376 


.0259 


• 0775 


.0940 


. 1014 


.0919 


.0741 


.0571 


.0444 


.0351 


.0269 


.0000 


. 0000 


.0008 


.0043 


.0121 


.0238 


.0372 


.0477 


.0458 


AAAA 

• 0000 


a a i r 

. 0016 


.0188 


.0532 


.0847 


.0990 


.0925 


.0687 


.0428 


• 9990 


.9016 


. 9039 


.8755 


.8085 


.6956 


.5623 


.5194 


.8114 


• 0643 


.0741 


.0793 


.0791 


. 0749 


.0677 


.0579 


.0455 


.0287 


• 1G80 


. 1615 


. 1166 


.0924 


.0791 


.0677 


.0559 


.0436 


.0305 


A A Art 

• 0000 


.0024 


.0120 


.0272 


.0438 


.0586 


.0687 


.0702 


.0559 


• 0028 


,0704 


.1449 


. 1685 


.1567 


.1284 


.0955 


.0666 


.0497 


• 3144 


.7570 


.5438 


.2802 


1264 


1144 






. 7U1U 


*1 ~J 1 1 

. 1711 


. 1517 


.1307 


.1118 


.0948 


.0790 


.0635 


.0475 


.0287 


.2571 


.2754 


.2147 


.1615 


. 1215 


.0915 


.0681 


.0491 


.0323 


A 1 O *1 

.0131 


.0497 


.0734 


.0968 


.1066 


.10°9 


.1037 


.0895 


.0615 


.4665 


.4476 


.3129 


.2183 


.1580 


.IL17 


.0946 


.0766 


.0604 


•5172 - 


^ Art A 

.7090 - 


.6032 - 


.4101 - 


. 1463 


.1672 


.4897 


.7690 


.9519 


. 1415 


. 1213 


.1065 


.0931 


.0807 


.0685 


.0560 


.0423 


.0257 


• 4414 


, 2618 


. 1830 


.1353 


.1045 


.0808 


.0620 


.0462 


.0313 


• 2613 


A O A 

. 2370 


.2155 


.1947 


.1736 


.1512 


.1265 


.0977 


.0605 


• 6762 


/ A f #1 

. 4268 


.3143 


.2454 


. 1967 


.1589 


.1272 


.0988 


.0702 


1 ft! 7 - 




. l*f / J 


. Z'Jot 


. 4474 


C A / A 

.5940 


.7366 


.8681 


A 1 

. '681 


.0047 


,c:oi 


.0319 


.0387 


.0415 


.0409 


.0374 


.0306 


.0196 


.1883 


.1980 


.1387 


.0969 


.0715 


.0557 


.0449 


.0364 


.0273 


.3159 


.2815 


.2442 


.2099 


.1783 


.1483 


.1184 


.0870 


.0508 


. 4560 


.4882 


.3945 


.3074 


.2396 


.1869 


.1446 


.1085 


.0731 


.5H91 - 


.6904 - 


.5431 - 


.2958 


.0117 


.3295 


.6139 


.8345 


.9664 


.0000 


.0003 


.0020 


.0055 


.0097 


.0134 


.0156 


.0151 


.0107 


.0001 


.0105 


.0344 


.0496 


.0515 


.0446 


.0343 


.025': 


.0199 


.0799 


.0897 


.0964 


.0976 


.0932 


.0841 


.0708 


.0533 


.0308 


.2294 


.2261 


.1918 


.1586 


.1365 


.1204 


.1052 


.0873 


.0621 


.8150 


.7162 


.6228 


.4706 


.3425 


.3113 


.4234 


.7064 


.9501 
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Table of the False Positive Error and its 
S.E.*SQRT(M), the False Negative Error and its 
S.E.*SQRT(M), and the Correlation between FP and FN 
Number of Items: 10, Theta Zero: .80, Mastery Score: 8 

Test KR21- 

-!!.—• 100 ,20 ° ,30 ° " 400 - 5 ; - 600 - 700 - 80n - 900 

1.0 .0000 .0000 .0001 .0002 .0006""o0ir""o044""o097"""oi46 

" 2S21 -2°22 - 0009 - 0026 - 0067 - 0158 -0314 .0446 .0343 

.0000 .0000 .0000 .0000 .0000 .0000 .0001 .0007 .0026 

.0000 .0000 .0000 .0000 .0000 .0002 .0017 :0068 0107 

.9718 .9662 .9634 .9594 .9874 .9885 .9880 .9796 .9112 

2.0 .0002 .0005 .0010 .0022 .0045 .0090 .0167 .0265 .0307 

.0025 .0050 .0094 .0173 .0304 .0492 .0666 .0670 .0436 

"222? 0000 • 000 ° -O 000 -0000 -0001 .0006 .0026 .0062 

.0000 .0000 .0000 .0000 .0001 .0014 .0061 .0140 .0144 

.9464 .9656 .9364 .9674 .9730 .9756 .9724 .9512 .8351 

3.0 .0027 .0043 .0071 .0114 .0183 .0282 .0402 0499 0467 

.0200 .0292 .0420 .0597 .0808 !o970 0971 0763 0498 

.0000 .0000 .0000 .0000 .0001 .0006 .0023 !o062 !oi04 

.0000 .0000 .0000 .0001 .0015 .0064 .0151 0215 0156 

.9372 .9732 .9259 .9470 .9525 9511 9351 8773 7369 

"° '8» ■??& -° 526 : »«6 :?75 5 4 :0767 iOoOS 

"2^5 -22 51 - 1160 - 13bl - 1442 -1333 .1072 .0777 .0581 

•2222 *°222 • 000 ° - 0001 -o 00 ' -0027 .0066 .0119 .0150 

•2222 -???° - 0002 - 0023 - 0092 -0196 .0277 .0266 .0155 

.9990 .9119 .9026 .9115 .9081 .8833 .8197 i6377 !7032 

5.0 .0643 .0760 .0897 .1.037 .1146 .1195 .1160 1013 0701 

•SS "^2 - 21 25 * 2029 - 1695 - 13 * 8 -1077 0877 :0689 

.0000 .0000 .0J02 .0013 .0043 .0092 .0150 .0198 .0192 

•2222 -22?? -°25 2 - 0184 - 0327 .0400 .0374 .0267 ioiei 

.8041 
,0733 



.9990 .8216 .8334 .8086 7367 .6120 .4664 ^330 



6.0 .1810 .1949 .?025 .2005 .1398 .1719 .1476 1162 

-3257 .2999 .2390 .2017 .1808 .1607 .1370 *1098 '0787 

.0000 .0006 .0039 .0098 .0169 0233 0277 0283 0222 

■SS -2521 -?5« 7 S - 0643 : ° 3 ' 6 : ° 247 

.7019 .6364 .4258 .1582 -.0001 -.0070 .1535 5261 9197 

7,0 'illl -S8S iStf -?KS -* 258 -» 93 - 152 * *06i 5 

'S!? "5 3 3c ,451? - 3573 -2792 .2170 .1664 .1234 .0831 

• 0 2?9 ,0 i 13 '9llt - 0434 - 0438 -O^O -P346 .0230 

•\lni 'llii -?J?? - 0608 - 0451 - 0365 0308 - 0247 

-.5402 -.7680 -.6867 -.4964 -.1898 .1979 .5635 .8251 .9667 

8.0 .2741 .2*40 .2185 .1947 .1712 .1470 .1209 0913 0546 

.7931 .4995 .3659 .2839 .2259 1814 1448 1123 0783 

•EE -?SS -??S - 0678 - 0572 : ° 4 " :mS :o203 

.3614 .2161 .1526 .1150 .0893 .0703 .0548 0416 0285 

n -.0758 .1014 .2561 .4003 .5382 6705 7946 9026 9779 

9 *° -??I5 -22? - 0389 -? 533 - 0616 -0640 0603 IOSCO ! 

'im 'SS "So J -i?£ 5 - 1499 - 1162 - 0938 - 0785 0608 

•??Si •?$?? -?S2 3 - 0513 - 0422 - 0330 -° 23 5 .0129 

*Sm 'ilS -iiS - 0845 - 0683 - 0540 - 0 * 08 -0267 

.4953 -.1424 -.2467 -.1395 .0673 .3393 .6279 .8561 .9745 
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Table of the False Positive Error and its 
S.E,*SQRT(M), the False Negative Error and its 
S.E,*SQRT(M) , and the Correlation between FP and FN 
Number of Items: 10, Theta Zero: .80, Mastery Score: 9 



Test KR21= 



Mean 


n A A 

. 100 


AAA 

. 200 


ft A A 

. 300 


.400 


C ft ft 

.500 


AAA 

.600 


T ft ft 

. 700 


O ft ft. 

.800 


nnn 

. yoo 


1.0 


.0000 


.0000 


.0000 


.0000 


.0001 


.0004 


.0013 


.0034 


.0056 




.0000 


.0000 


.0001 


.0004 


.0015 


.0046 


.0109 


.0174 


.0141 




- .0000 


AAA A 

. 0000 


A AAA 

.0000 


ft f\ ft ft 

• 0000 


f\ ft ft ft 

• 0000 


ft ft ft ft 

. 0000 


AAA A 

.0002 


.0018 


. 00/0 




• 0000 


aa a a 

• 0000 


A A A A 

.0000 


ft ft ft ft 

• 0000 


ft ft ft ft 

• 0000 


ft ft ft / 

.0004 


ft ft ft ft 
.0039 


AT 

.017* 


AO A /. 




a t n i 

.9784 


A ^ / m 

• 9741 


A y a p 

.9635 


ft *T ft 1 

.9704 


.9920 


A A A t 

.9934 


A A A A 

.9928 


ft n £ ft 
.9863 


. 9262 


O ft 

2.0 


t\ e\ t\ t\ 
.0000 


A A A A 

• 0000 


ft f\ ft i 

.0001 


ft ft ft i 

• 0004 


r\ ft i ft 
.0010 


ft ft ft i 

.0024 


ft ft C ft 

.0053 


ft ft ft e 

.0095 


AT O A 

. 0120 




A A A O 

• 0003 


ft ft ft ^ 

• 0007 


A A ^ /» 

.0016 


A A A A 

• 0038 


ft ft o c 

. 0085 


A1 / / 

.0164 


ft O C 1 

.0251 


AO T O 

• 0272 


. Ol/o 




r\ r\ r\ r\ 

• 0000 


A A f\ A 

• 0000 


A A A A 

.0000 


A A ft ft 

.0000 


A A A ft 

.0000 


ft ft ft r\ 
.0002 


AA1 / 

.0014 


AA £ / 

.0064 


AT £ 7 

. Olo/ 




.0000 


A A A A 

• 0000 


A A A /"\ 

.0000 


ft ft ft ft 

• 0000 


ft ft ft ft 

.0003 


ft AO 1 

• 0031 


A 1 / C 

.0145 


A O £ O 

.C362 


A/. 1 Q 

. 041o 




.9530 


. 9751 


ft t ft f 

.9496 


ft ft ft ft 

.9800 


ft c / / 

.9844 


ft ft f r\ 
.9860 


ft ft ft ft 
.9330 


.9661 


. 8541 


3.0 


.0003 


.0006 


.0012 


.0024 


.0047 


.0086 


.0138 


.0186 


.0184 




.0032 


.0057 


.0100 


.0172 


.0274 


.0368 


.0395 


.0319 


.0199 




A A A A 

.0000 


A A A A 

. 0000 


A A A A 

.0000 


A A A A 

. 0000 


A A A A 

.0002 


A A 1 O 

• 0013 


A / v P P 

.0055 


A 1 P P 

.0155 


AO A C 

. 0285 




.0000 


A A A A 

.0000 


A A A A 

.0000 


.0003 


.0033 


A T / A 

.0149 


A A ^ H 

.0371 


A P T *"1 

.0577 


. 0464 




.9370 


.9402 


.9594 


.9692 


.9735 


.9717 


.9589 


.9112 


T / A / 

. 7424 


/ a 

4.0 


.0027 


A A / A 

. 0042 


.0067 


.0105 


.0159 


.0221 


A A 1 

.0274 


.0294 


A ^ / A 

.0242 




.0134 


.0264 


.0375 


.0501 


.0580 


.0562 


.0459 


.0322 


.0230 




.0000 


.0000 


.0000 


.0002 


.0016 


.0062 


.0160 


.0308 


.0416 




.0000 


. 0000 


.0004 


.0049 


.0207 


.0468 


.0708 


.0743 


ft i c ft 

.0459 




.9990 


.9990 


A / J A 

.9442 


.9519 


ft 1 ft A 

.9490 


.9304 


.8801 


^ P A A 

.7520 


. 6636 


5.0 


.0143 


.0192 


.0257 


.0331 


.0397 


.0438 


A lit 

,0444 


.0400 


A A O 1 

. 0283 




.0C63 


A CI 1 A 

0319 


r\ ft ft ft 

.0939 


A A ^ i 

.0914 


A ^ f ft 

.0769 


. 0593 


ft 1 1 r\ 
.0448 


A ft C ft 

.0350 


AO T / 

. 0274 




A A A A 

.0000 


A A A A 

.0000 


AAA/ 

• 0004 


i , A A f* 

.0029 


A A A <"f 

.0097 


A A ^ ^ 

.0217 


A A ^ A 

.0373 


ft C ft ft 

.0522 


AC / O 

. 0543 




A AAA 

.0000 


a a a r 

. 0006 


ft 1 1 1 
.0111 


.0411 


A T f ft 

.0769 


T A A / 

. 1004 


. 1013 


.0780 


A / C C 

. 0455 




.9990 


ft a a ft 

. 9023 


.9126 


. 8974 


A / A A 

.8482 


.7474 


. 5874 


/ f ^ ft 

.4579 


. 7377 


6.0 


.0551 


.0649 


.0723 


.0750 


.0732 


.0677 


.0591 


.0^70 


.0299 




.1677 


.■»606 


.1212 


.0910 


.0751 


.0648 


.0550 


.0442 


.0319 




.0000 


A A 1 A 

. 0012 


A A ft A 

.0083 


.0221 


A A A ? 

.0396 


A P ^ A 

.0572 


A T n A 

.0713 


ft •* /* ft 

.0768 


A f / A 

.0640 




.0007 


. 0425 


.1180 


.1599 


.1626 


. 1402 


n A P A 

. 1056 


A ^ A ** 

.0707 


ft C ft ^ 




. 8402 


. 8179 


< A A O 

.6903 


/ p A A 

.4538 


A A A P 

. 2225 


. 1089 


. 1527 


/ A P P 

.4256 


ft ft £. ft 

. 8863 


7.0 


1579 


. 14*8 


4 A A ^ 

.1281 


.1108 


.0945 


.0789 


.0635 


.0472 


ft ft ft ft 
• C283 




.2010 


. 2300 


.1977 


. 1561 


.1210 


.0930 


.0704 


A P 1 f 

.0516 


AO / / 

.0344 




• 0078 


.0401 


.0708 


.0931 


. 1071 


.1130 


.1104 


.0972 


.0676 




.3395 


.4360 


.3341 


.2393 


.1709 


^ r\ / ft 

.1249 


.0963 


.0602 


A ^ A A 

.0683 




- .0746 


/tin 

-.6418 


-.6135 


-.4787 


-.2593 


.0509 


.4250 


.7641 


AP AO 

.9588 


0. 0 


. Uo7 


. lit y 


. 1U16 


AAA/ 

. C884 


A O £. 1 

.0761 


f\£. / A 

.0642 


ft Ci ft 

.0519 


AO OC 

. 0386 


AO A T 

. 022/ 




.4412 


.2615 


.1332 


.1370 


.1056 


.0824 


.0641 


.0484 


.0329 




.2692 


. 2459 


.2247 


.2036 


.1817 


.1582 


.1313 


.1007 


.0609 




.6748 


.4299 


.3206 


.2545 


.2082 


.1723 


.1421 


.1140 


.0828 




-.1919 


-.0050 


.1674 


.3327 


.4918 


.6430 


.7817 


.8987 


.9776 


9.0 


.0013 


.0100 


.0192 


.0255 


.0286 


.0290 


.0266 


.0215 


.0132 




.0758 


. 1401 


.1174 


H872 


.0645 


.0497 


.0406 


.0341 


.0259 




.2629 


.2435 


.2153 


.1861 


.1576 


.1296 


.1014 


.0721 


.0397 




.3631 


.3936 


.3626 


.3052 


.2512 


.2042 


.1628 


.1237 


.0814 




-.1701 


-.5140 


-.4586 


-.2716 


.0065 


.3383 


.6505 


.8697 


.9768 
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APPENDIX B 

SUBROUTINE ERRFPN 

This subroutine computes the false positive error estimate and 
its standard error, the false negative error estimate and i- 
standard error, and the correlation between the two estimate . The 
beta-binomial distribution is used as the vehicle for computations. 

Disclaimer : The computer program hereafter listed has been written 
with care and tested extensivelv under a variety of conditions 
using tests with 60 or fewer items. The author, however, makes no 
warranty as to its accuracy and functioning, nor shall the fact of 
its distribution imply such warranty. 
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c SUBROUTINE ERRFP;i(iJ,A,B,M,TT,IM,FP,SEFP,FH,SEFN,RHO) le 

C 20 

C BINOMIAL DISTRIBUTION IS USED AS TffiVmol^COMwSfo^' 70 

C INPUT DATA ARE: 00 

C 90 

C N NUMBER OF ITEMS 100 

r • • •^ lA «2 F J HE BETA DISTRIBUTION H° 

r '^.F m BETA DISTRIBUTION t|° 

C I!. . . .NUMBER OF EXAMINEES "0 

r ^•••IS?I A ZER0 ' ^ CRITERION LEVEL SET IN THE TRUE SCORr 7™ 

C IM...TEST CUTOFF SCORE (MASTERY SCORE) SC ° RE }|J 

C A, B, AND TT ARE IN TEE DOUBLE PRECIS ICE FORMAT. iJq 

C OUTPUT DATA ARE: 190 

C 200 

r ■Z ALSZ POSITIVE ERROR ESTIMATE ijlS 
C SEFP. .STANDARD ERROR OF FP 

C FN FALSE NEGATIVE ERROR ESTIMATE ?7« 

C SuFN.. STANDARD ERROR OFFN 

C RHO. . .CORRELATION BETWEEN FP AND FN 260 

C ALL OUTPUT DATA ARE IN THE DOUBLE PRECISION FORMAT. jH 

C THE SUBROUTINE IS SET UP FOR TESTS WITH UP TO 60 ITEMS inn 

I s tttt^^j?jE5r»™ •■»<•>■ 1 

C EXTERNAL SUBROUTINES REQUIRED: DQG32 OF SSP U% 

£ MDBETA OF IMSL 350 

C J60 

EXTERNAL BETA , BI t GFCT , DFCT , PSI £00 

OHE-1.D0 

Y1-BETA(A,E) 

Y2-PSI(A+B) 

Y3-PSI(A)-Y2 <50 

Y4-PSI(B)-Y2 460 

Pl-PSI(DFLOAT(N)+A+B) *™ 
CALL NEHY2(:J,A,B f DF) 

c CALL VARA£(K, A, E f HI, H2,H3 l K,DF t DA l DB) Jgj 

C SET UP FOR FALSE POSITIVE ERRORS eJn 
T2"TT 

IC-IM 530 

U-A+DFLOAT(IC) 540 

V-B+DFLOAT(IMC) 550 

Wi-0. 560 

W2-0. 570 

: 530 

DO 40 L-1,2 590 

: 600 

F-ONE-TZ 61 0 

DX-DFCT(U,V,TZ) 620 

GA-GFCT(U.V,TZ) 630 

GB-GFCT(V,U,F) |*° 

660 
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BB-BI(N,IC) 670 

E(L)-DX*BB 680 

DFPA-GA*BB 690 

C 700 

BA-BETA(U,V) 710 

PA-PSI(V) 7i>0 

DFPB-(BA*(PA-P1)-GB)*BB 730 

C 74C 

C 750 

IF(IC.EQ.N) GO TO 30 760 

C 7 70 

10 IZ-K-IC 780 

DO 15 I-l.IZ 790 

IX-IC+I 80C 

VMONE-V-OHE 810 

Zl— <TZ**U) *F**VMONE 820 

Z2-Z1*DL0G(TZ) 830 

Z3- (F**VMD11E) * (TZ**U) *I LOG (F) 840 

C 850 

CA- (Z2+BX+U*GA) /VMO^E 360 

C 870 

C 880 

DX- (Z1+U*DX) /VMONE 890 

O00 

BB-BE*(N-IX+1)/IX 91C 

C 920 

V-V-ONE O30 

M-BA*U/V 940 

C 950 

GB- (Z3- (BA-DX)+U*GB)/VMOHE $60 

C 970 

U-U+ONE 980 

PA-PA-ONE/V 99C 

C 1000 

C 1010 

E(L)-E(L)+BB*DX 1020 

DP* A-DFPA+BB*GA 1030 

DFP5-DFPB+BB* (BA* (PA-P 1) -GB) 1 040 

15 CONTINUE 1050 

30 IF(L.EQ.l) GOTO 35 1060 

C 1070 

C IHTERCHANGE DFPA AND DFPB FOR FALSE NEGATIVE ERROR lGfcO 

C 1090 

F-DFPA 1100 

DFPA-DFPS 1110 

DFPL-F 1120 

C 1130 

35 E(L)-E(L)/Y1 1140 

DFPA-DFPA/Y1-E(L)*Y3 1150 

DFPB-DFPB/Y1-E(L)*Y4 1160 

W1-W1+DFPA 1170 

W2-W2+DFPB 1180 

C 1190 

? ?.200 

S (L)-(H1*DFPA**2+H2*DFPB**2+2*H3*DFPA*DFP^^**. 5D0 121C 

C 1220 

C SET UP FOR FALSE NEGAT^T. £RRORS 1230 

TZ-OHE-TT 124C 

ION-IM+1 1250 

U-B+DFLOAT(IC) 1260 

V-A-fDFLOAT(N-IC) 1270 

C 1280 

4') CONTINUE 1290 

C 1300 

FP-E(l) 1310 

rN-E(2) :.320 
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SEFP-SCD 

SEFN-S(2) . l "0 

RHO -(HW1^2+H2^2^2+2.*H3^1*W2-Sa)**2-S(2)**2)/(S(l)*S(2)*2)1350 

C 1360 

RETURN "70 

END 1380 

DOUBLE PRECISION FUNCTION BI(N,M) JJqO 

IF(M*(N-M).EQ.O) GOTO 20 

MM-MIN(N.N-M) \™ 

DO 15 J-l.MK \™0 

15 BI-BI*(N-J+1)/J 

20 RETURN 

END }}«J 

SUBROUTINE NEHY2(H,A,B,F) T^SO 

DOUBLE PRECISION A,B,F(1) ,21,22 JJgo 

21-DFL0AT(N)+B ttnn 

g 0 zl * mo 

F(1)-1.D0 J"0 

DO 5 I-l.N [HI 

?n ^ 1 1 >-£( 1 >*C21-DFL0AT(I))/(Z2-DFL0AT(I)) Is50 

KP2"K+2 ajou 

F(KP2)-F(KPl)*DFLOAT(N-K)*CA+DFLOAT(K))/ ^80 

R-K+1 (DFLOAT (KP1) * (Zl-DFLQAT ) ) 1590 

IF(K-N) 10, is. 15 
15 RETURN 

EIJD 1630 

W p " EC JI2.^:S:Ji? u ' BI2 ' B "' ,, ' w ' v '' v " igfg 



B11-0.D0 
B12-0.D0 



1680 



B22-0.D0 "oO 

do 15 1-1 npi f;±x 

B11-B11+DA(I)*DA(I)*F(I) tjfo 

B12-B12+DA(I)*DB(I)*F(I) ^40 

15 B22-B22+DB(I)*DB(I)*F(I) £75^ 



B11-B11*M 
B12-B12*M 



1760 
1770 



B22-B22*M tiiX 

D-B11*B22-B12*B12 t?Qn 

VA-B22/D {I™ 

VB-B11/D J 8 ™ 

VAB-B12/D ,820 

RETURN tnin 
END 

SUBROUTINE DERLAB(N,A,B,DA,DB) toso 

DIMENSION DACD.DBd) Igfio 

DOUBLE PRECISION A.B.DA.DB.Z1.Z2 I370 

DOUBLE PRECISION ONE ■ 

ONE- 1. DO }||J 

DA(1)-0.D0 t^S 

DB(1)-0.D0 tqln 

21-DFL0AT(N)+B £920 



Z2-21+A 
NP1-N+1 



1930 
1940 
1950 



DO 5 I-l.N 1960 
DA(l)-DA(l)-ONE/(22-DFLOAT(I)) 1970 
DB(l)-DB(l)+ONE/(21-DFL0AT(I)) 1980 
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DB(1)-DB(1)+DA(1) 1990 

C 2000 

DO 10 1-1, H 2010 

IP1-I+1 2020 

IX-I-1 2030 

DA(IPl)-DA(I)+OWE/ (A+DFLOAT (IX) ) 2040 

10 DB(IPl)-DB(I)-ONE/(Zl-DFL0AT(I)) 2050 

RETURN 2060 

EUD 2070 

DOUBLE PRECISION FUNCTION PSI(X) 2080 

DOUBLE PRECISION X, A,P,ZETA(99) ,Y(54) ,PSI1 ,PM1,PP1 ,PM2,P2M1 2090 

C 2100 

ZETA(2) -1 . 64493406684822643647D0 2110 

ZETA(3) -1.2020569031595942G540B0 212<T 

ZETA(4) -1.03232323371113819152D0 2130 

ZETAC5) -1.036927755K336992633D0 2140 

ZETA(6) -1.01734306198444913971D0 2150 

ZETA(7) -1.00834927738192282684D0 2160 

ZETA(8) -1.00407725C19794433938D0 2170 

ZETA(9) -1.0020083928260U221442D0 2180 

ZETA(10)-l.O0099457512781808534D0 2190 

ZETA(11)-1.00049418860411946456D0 2200 

ZETA(12)-1 . 00O246086553308O4830D0 2210 

ZETA(13)-1 . 00012271334757848915D0 2220 

ZETA(14)-1.00006124813505870483D0 2230 

ZETA(15)-1.0000305882363O702049D0 2240 

ZETA(16)-1.00001528225940865187D0 2250 

ZETA (17)-1 . 0000C763719763789976D0 2260 

ZETA(18)-1.00000381729326499984D0 2270 

ZETA(19)-l.O0000190821271655394D0 2280 

ZETA(20)-1.00000095396203387280D0 2290 

ZETA(21)-1. 00000047693298678781D0 2300 

ZETA(22)-1.00000023845050272773D0 2310 

ZETA(23)-1.00000011921992596531D0 2320 

ZETA(24)-1 . 00000005960818905126D0 2330 

ZETA (25) -1 . 0000000298035035 1465D0 2340 

ZETA(26)-1 . 00000001490155482837D0 2350 

ZETA(27)»1.00000000745071178984D0 2360 

ZETA(28)-1.00000000372533402479D0 2370 

ZETA(29)-1.0000000O186265972351D0 2380 

ZETA(30)-1.00000000093132743242D0 2390 

ZETA(31)-1.00000000046566290650D0 2400 

ZETA(32)-1.00000000023283118337D0 2410 

ZETA(33)-1.0000C000011641550173D0 2420 

ZETA(34)-1.00000000005820772088D0 2430 

ZETA(35)-1.00000000O02910385044D0 2440 

ZETA(36)-1 . 00000000001455192189D0 2450 

2ETA(37)-1.00000000000727595984D0 2460 

ZETA(38)-1.00000000000363797955D0 2470 

ZETA(39) -1 . O000000GO001C1898965D0 2480 

ZETA(40)-i.0000000O00009094947£D0 2490 

ZETA(4l)-l . 00000000000045474738D0 2500 

ZETA(42)-1 . 00000000000022737368D0 2510 

C 2520 

Y(l) -.2436449038DO 2530 

Y(2) -.2474724535D0 2H0 

Y(3) -.2512859559D0 2550 

Y(4) -.2550855103D0 2560 

Y(5) -.2588712154D0 -570 

Y(6> -.2626431686D0 25&0 

Y(7) -.2664014664D0 2590 

V(0 -.2701462043D0 2o00 

Y(9) -.2733774769D0 2610 

Y(10)-.2775953776D0 2620 

Y(ll)-.23129999 o 2D0 2630 

Y(12)«.2C^9914333D0 2640 
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Y(13)- 


.288C697707DO 


2650 






YfK)- 


.2923351012D0 


2660 






Y(15)- 


.2959875133D0 


2670 






Y(16)- 


.2996270966D0 


2680 






Y(17)- 


.30325393C7D0 


2690 






Y(1C)- 


.300tf661205DC 


2700 






Y(19)- 


.310469733530 


271C 






Y(20)- 


.31405S86C2D0 


2720 






Y(21)- 


.3176355846D0 


2730 






Y(22)- 


■3211999895D0 


2740 






Y(23)« 


.3247521572D0 


2750 






Y(24)- 


.3282921691D0 


2760 






Y(25)- 


.3318201056D0 


2770 






Y(2M- 


■ 335336O467 t j0 


2780 






Y(2/)- 


.338<5400713I>0 


2790 






Y(26)- 


■3423322577D0 


280C 






Y(29)- 


.34581268351)0 


2810 






Y(30)* 


.3492614255D0 


2G2C 






Y(31)- 


.352733559600 


2G30 






Y(32)« 


.3561841612D0 


2840 






Y(33)- 


.3596183049DC 


2e:o 






Y(34)- 
Y(35)- 
Y(36)- 


.3630410646DO 


2J6C 






.36645251361)0 


2870 






.3698527244D0 


2880 






Y(37)- 


.3732U7668D0 


2f?0 






Y(38)- 


.3766197179D0 


2?00 






Y(39)« 


.3799666424D0 


2910 






Y(40)- 


.3333426119D0 


2°20 






Y(41)« 


.3366876959D0 


2930 






Y(42)- 


.3900219C27D0 


2940 






Y(43)*< 


.3933454C05DC 


2950 






Y(44)- 


.396C583163D0 


2960 






Y(45)« 


.3999605371D0 


2970 






Y(46)» 


.4032522088D0 


29S0 






Y(47)- 


.40653:3970D0 


2990 






Y(4C)- 


.4098041664D3 


3000 






Y(49)« 


.4130645G16D0 


3010 






Y(50)- 


.4163147060D0 


3020 






Y(51)- 


.4195546030D0 


3030 






Y'52)- 


.4227043351D0 


3040 






Y(53)- 


>426003S643D0 


3C50 


C: 




Y(34)- 


>4292135520D0 


3060 
3070 






a-;: 




3080 






IF(X.LT.l.PO) A- +1.D0 


3090 






rsii— 


.5772156649D0 


3100 


c 








3110 






IF(A.GT>1.DO)GO TO 5 


3120 


c 








3130 






PSI-PSI1 


3140 






RETURN 




3150 


c 








2160 




5 


PSI-O. 


DO 


3170 


c 








3180 






IF(A.LT>2.D0)GO TO 20 


3190 


c 








3200 




10 


A-A-l. 


DO 


3210 






PSI-PS 


I+l.DG/A 


3220 






IF(A>LT.2.D0)GO TO 20 


3230 






GO TO 


10 


3240 


r 








3250 




20 


IF(A.GT.1.75DO)GC> TO 35 
IF(A.GT.l.DO) GOTO 21 
PSI-PSI+PSU 


3260 
3270 
3280 






RETURN 




32 l >0 


c 








3300 
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21 A-A-i.DO 331C 

L— 23 . 21647129DO/DLOG(A)+1 3320 

IF(L.LT.2)L«2 3330 

M-MIUC(L,42) 3340 

C 3350 

DO 25 H-2.M 3360 

25 ?SI-PSI+(-l)**K*ZETACO*A**(:i-l> 3370 

PSI-PSI+PSI1 -380 

IF(II.EQ.L) GOTC 40 3390 

C 3400 

Ml-M+i 3410 

DO 30 N-Ml.L 3420 

ZETA(N)-(ZETA(N-1)+1.D0)*.5D0 3430 

30 PSI-PSI+(-l)**N*ZETA(W)*A**(N-t) 3440 

GOTO 40 3450 

C 3460 

35 P~(A-1.745D0)*200.D0 3470 

IZ-DIUT (P+l . D- 10 ) 34C0 

IF(IZ.LT.l) IZ-1 3490 

C 3500 

P-P-DFLOAT(IZ) 3510 

IZ-IZ+1 3520 

C 3530 

JF(P.NL.O.DO) GOTO 37 2540 

C 3550 

PSI-Y(IZ) 3560 

GOTO 40 2570 

C 35SC 

37 PM1-P-1.D0 3590 

PP1-P+1.DC 3600 

PK2-P-2.D0 3610 

P2M1-PM1*?P1 3620 

PSI--P*PM1*PM2/6.DO*Y(IZ-1)+P2M1*PM2/2.DO*Y(IZ)- 3630 

^*PPl*PM2/2.D0*Y(IZ+l)+P*P2Ml/6.D0*Y(IZ+2)+PSI 3640 

C 3650 

40 IF(X.LT.l.O) PSI-PSI-1.D0/X 3660 

RETuTJT 3670 

END 3680 

DOUBLE PRECISION FUNCTION GFCr (U ( V,TZ) 3690 

EXTERNAL FCT.DFCT 3700 
DOUBLE PRECISION U , V , TZ , VP , UP , DFCT , ONE , H , XL # XU , FCT , Y , Yl , YHOLD , E PS 3710 

DOUBLE PRECISION DX.TWO 3720 

COMMON UF.VP 1^730 

TCO2.D0 37^0 

C 3750 

C 3760 

IER-0 3770 

XL-O.DO 3780 

XU-7Z 3790 

ONE- 1. DO 3800 

EPS". 00005 3^10 

KL-15 3C20 

IU-U-TWO 3830 

IF(U.LE.TWO) IU-0 3040 

UP- U- DFLOAT ( IU) 3850 

IV-V-TVO 3860 

IF(V.LE.TUO) IV-0 3370 

VP-V-DFLOAT(IV) 3680 

C 389C 

DX«DFCT(UP , VP , TZ) 3900 

C 3910 

IF (L. LT. ONE) UP-UP+ONE 3920 

C 3933 

CALL DQG3 2 (XL, XL, FCT, YHOLD) 3940 

C 395A 

DO 6 J-2.KL 3960 
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Y-O.DO 
ML- 2** J 



3970 



H- ?Z/DFLOAT(ML) 

DO 5 I-l.ML J°°°, 

XL-DFLOAT(I-l)*H ^020 

5 2S Y f5 G32 ( XL - XU . F CT.Yl) Jgjg 

l ( W-™ 0U » /YHOLD) . LE . EPS) GOTO 7 4060 

6 YHOLD-Y 40?0 

7 GFCT-Y JJ}0 

IF(IER.NE.O)WRITEC6, 100)1'. V.TZ.ML.EPS IHn 

100 ,W IH GFCT AT U.V.THETA ZERO - ' .3F10.5/ 4140 

IF(U.GE.ONE) GOTO 9 ^12 

UP- UP-ONE f. 1 ^ 

YHOLD-TZ**UP*(ONE-TZ)**VP 7,™ 

^S! 0 ^*^ 1 - 00 ^ 2 ) "ONE/ (CP+VP) ) -DX*VP/ (UP+VP) 4210 

GFCT- (UP+VP ) *GFCT/UP+H/UP 4220 

9 IF(IU.EQ.O) GO TO 20 Jjjg 

DO 10 1-1,10 H 5 A 

YHOLD-TZ**UP*(ONE-TZ)**VP 7„n 

H-YHOLD* (DLOG (TZ ) -ONE/ (UP+VP) ) -DX*VP/ CUP+VP) 4280 

GFCT- (UP*GFCT-H) / (UP+VP ) ' ' ' ?29n 

DX-(-YHOLD+UP*DX)/ (UP+VP) Z^OO 
10 UP-UP+ONE 

20 IF(IV.EQ.O) RETURN 

DO 30 I-l.IV J340 

YKOLD-TZ**U*(0rJE-TZ)**vp 2«n 

2:™°"£ (DL0G(TZ) -° IIE / C u+vp > > -DX*VP/ CU+VP) Z370 

GFCT-(GFCT*VP+H)/(U+VP) lil^ 

DX- (YHOLD+VP*DX) / (U+VP) 2ion 

30 VP-VP+ONE *?90 

4400 

gSSJSg"" ™ CI101 * »=»■•.«> 88 

DOUBLE PRECISION A , B , TZ , BETA JJfg 



AA-A 
BB-B 

7ZZ-7Z 



4470 
4480 
4490 



CALl/llDBETA(T22 , AA , B3 , P , IER) Jfjg 

IF(IER.nE.O) raiTE(6 l 100)A,B i TZ i IER 4538 

100 Kffi riSw) m ' * » » ™ « •, 3 ,ao.i.. I5 , 

KETUKil 

END ^ 60 

DOUBLE PRECISION FUNCTION BETA(X.Y) tsr.n 

DOUBLE PRECISION A.B.CON.X.Y.F A590 

111 00 4600 

B-Y 4610 

11 1 4620 
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CON" 1. DO 4630 

IF(A.LE.F) GOTO 2 4640 

1 A-A-1.D0 4650 
CO>CON*A/ CA+B) 4660 
IF(A.LC.F) GOTO 2 4670 
GOTO 1 4680 

2 IF(B.LE.F) GOTO 4 4690 

3 B-B-1.D0 4700 
COii-CON*B/ (A+B) 4710 
IF(B.LE.F) GOTO 4 *720 
GOTO 3 4730 

4 BETA-DGAMMA (A) *DGAMMA (B) / DGAMMA (A+B) *CON 4740 
RETURN 4750 
END 4760 
DOUBLE PRECISION FUNCTION FCT(T) 4770 
COMMON U,V 4780 
DOUBLE PRECISION T,U,V 47S0 
FCT-O.DO 480C 
IFCT.EQ.O.DO) RETURN 4810 
IF(T.EQ.1.D0) RETURN 4820 

C 4830 

FCT-T** (U- 1 . DO) * CI . DO-T) ** ( V- 1 . DO) *DLOG (T) 4840 

RETURN 4850 

END 4860 
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RELATIONSHIP BETWEEN DECISION ACCURACY AND 
DECISION CONSISTENCY IN MASTERY TESTING 



Huynh Huynh 
Joseph C. Saunders 

University of South Carolina 



ABSTRACT 

In mastery testing, decision accuracy rtfers to the proportion 
of examinees who are classified correctly, in one of several 
achievement categories, by test data. Decision consistency express- 
es the extent to which decisions agree across two test administra- 
tions. Based on twelve cases involving a wide range of a 21 reli- 
abilities, it was found that decision accuracy and decision con- 
sistency were almost perfectly related. 



1 . INTRODUCTION 

In classical measurement theory and practice, the reliability 
of a set of measurements (often, albeit unfortunately, referred to 
as the reliability of a test) is typically defined as the ratio of 
true-score variance to observed-score variance. The aspumptions 
of classical test theory imply reliability can also be viewed as 
the correlation between two sets of parallel measurements 
This paper has been distributed separately as RM 80-8, August, 1980. 
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(Lord & Novick, 1968). Capitalizing upon this property, several 
writers (Carver, 1970; Hambleton & Novick, 1973; Huynh, 1976c; 
Subkoviak, 1976) have proposed that reliability (of decisions) in 
mastery testing be considered from the standpoint of decision 
consistency (i.e., consistency of individual decisions across two 
test administrations). It has also been argued (Huynh, 1976b, 
[for the case of Q»l3; Livingston & Wingersky, 1979; van der Linden & 
Mellenbergh, 1978; Subkoviak & Wilcox, 1978; Wilcox, 1977) that the 
quality of the decision-making process would be more appropriately 
assessed via the agreement between decicions based on test data 
and those based on true scores, had these been known. Such agree- 
ment, in its simplest form, may be expressed as the proportion of 
examinees who are correctly classified by the test scores. This 
quantity will be referred to as decision accuracy in subsequent 
sections of this paper. In a slightly different form, it has been 
called a validity coefficient by Berk (1976). Decision accuracy, 
in thiu context, presumes that false positive and false negative 
errors are weighted equally. When the weights (losses or utilities) 
are not equal, then coefficients b^sed on decision theory, such as e 
(Huynh, 1976b), 6 (van der Linden & Mellenbergh, 1978), or y (Wilcox, 
1978) may be more appropriate. However, decision consistency re- 
gards both types of inconsistent decision as being of equal severity. 
Thus, only the case involving equal (and constant) losses will be 
considered in this paper, so that comparisons might be anchored in 
the same framework. 

The purpose of this paper is to study the relationship between 
decision consistency and decision accuracy for a variety of situa- 
tions involving mastery tests. For reason of computational sim- 
plicity, the study is restricted to test score distributions which 
follow a beta-binomial form. 

2. COMPUTATIONAL PROCEDURES 

Let x and 8 denote the observed and true score for a subject, 
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erJc 



and let c and 6 q denote the corresponding passing scores for 

mastery classification. In addition, let y be the observed score 

for the same subject on a second (parallel) test administration. 

The raw index of decision consistency is defined as p * 

xy 

Pr(x<c,y<c) + Pr(x>c,y>c), and an index of decision accuracy may 
be taken as p xQ » Pr(x<c,6<6 o ) + Pr(x>c,e>e o ) . (Other indices 
similar to Cohen 1 s kappa may also be used; however, since the 
marginal probabilities of the mastery and nonmastery categories as 
defined by the test scores x and y, and by the true score 6 are 
identical or almost identical, any relationship between the p 
indices would hold for the kappa indices.) 

When the test data can be described via a beta-binomial model, 
both indices p^ and p^ fl may be computed via formulae, tables, and 
computer programs reported in Huynh (1979a, 1979b, 1980b, 1980c). 
Additionally, in the context of decision-making, it seems logical 
:o select a (test) passing score c which reflects the true cutoff 
score 6 q and the two (equal and constant) losses under consideration. 
When the beta-binomial model holds, the value c may be obtained via 
the incomplete beta functions (Huynh, 1976a). Let n be the number 
of items, and a and g be the two parameters of the beta distri- 
bution. Then the Bayesian passing score is the smallest integer c 

at which the incomplete beta function I(ofc,n+B-c;6 ) is less than 

o 

or equal tr .5. In most instances involving minimax decisions 
(Huynh, 1980b), the value of c is very close to n9 Q ; this simple 
expression will be used throughout this paper. 

3. DATA BASZ 

Two sets of test data were used in this study, one fictitious 
and the other derived from responses to the Science Research 
Associates Mastery Tests (SRA, 1974, 1975). The fictitious data 
set consists of eight beta-binomial distributions, each of which 
was selected to yield a testing situation in which the <x 21 re- 
liability was low or moderate. Table 1 contains descriptions 
of these cases* 
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TABLE 1 



A Comparison of Decision Accuracy and 
Decision Consistency based on 
Moderately Reliable Beta-Binomial Test Scores 



Case 


Shape 


n 


V 


a 


a n 


e 

0 


c 


P x6 


p 

xy 


1 


Unimodal 


5 


3.125 


1.301 


.385 




3 


.768 


.687 


2 


Symmetric 


5 


2.500 


1.279 


.294 




3 


.693 


.605 


3 


Unimodal 


10 


8.000 


1.706 


.500 




7 


.845 


.799 


4 


J-Shaped 


10 


9.000 


1.500 


.667 




7 


.941 


.921 


5 


Unimodal 


20 


12.000 


3.024 


.500 




14 


.773 


.678 


6 


Unimodal 


20 


16.000 


2.646 


.571 




14 


.868 


.821 


7 


Unimodal 


30 


16.000 


3.801 


.500 


.8 


24 


.979 


.964 


8 


J-Shaped 


30 


29.250 


1.319 


.600 


.8 


24 


.993 


.990 



Table 2 describes the second data set which consists of four 
SRA-comp"*led tests. The SRA data were obtained from the South 
Carolina State Department of Education. The data, consisting of 
the item responses of approximately 3000 sixth grade students for 
the SRA Mathematics (form X) and SOBAR Reading (form L) tests, 
were collected in a field testing conducted in the spring of 1978. 
Artificial subtests of 10, 20, 30, and 40 items were created from 
the SRA data by random selection of items from sets of homogeneous 
objectives. 

TABLE 2 



Description of the SRA Mastery Tests Data 



Case 


Subject 


Number of 


Mean 


S.D. 


a 21 




Area 


Items 








9 


Reading 


10 


7.016 


2.391 


.704 


10 


Reading 


20 


12.268 


4.787 


.835 


11 


Math 


30 


15.666 


5.901 


.812 


12 


Math 


40 


19.552 


7.439 


.840 




4. 


RESULTS AND 


DISCUSSION 







The data regarding decision accuracy and decision consistency 
are reported in the right side on Table 1 for the fictitious data 
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set and in Table 3 for the SPA-compiled tests. In all situations 

under consideration, p is smaller than p fl ; the ratio of p to 

a j xv xy 

P xQ averages about .96. However, the correlation between the two 

indices is .993, which represents an almost perfect linear re- 
lationship. For the 12 cases under study, decision accuracy 
relates to decision consistency via the empirical formula 
P xe - .25 + .75 P _. 



r xy 

TABLE 3 



A Comparison of Decision Accuracy and 
Decision Consistency Based on leal Data 



Case 


True 
Cutoff e 

0 


Test 
Cutoff c 


Decision 
Accuracy 


Decision 
Consistency 


9 


.50 
.70 


5 
7 


.894 
.828 


.858 
.780 


10 


10 
14 


.892 
.870 


.852 
.826 


11 


.50 
.70 


15 
?1 


.863 
.893 


.812 
.853 


12 


.50 
.70 


20 
28 


.872 
.922 


.823 
.892 



This study indicates that there is little difference between 
the indices of decision accuracy and decision consistency in terms 
of ranking the quality of different test-based decision-making 
processes. Decision accuracy can be predicted with very little 
error from decision consistency. The relationship between the two 
indices thus parallels that of the two approaches to classical re- 
liability discussed in the introduction to this paper. 

The basic result of this study casts doubt on the conjecture 
by Mellenbergh and van der Linden (1979, p. 263) that "the con- 
sistency of decisions is not related in the same way to the 
association between decisions and true states as consistency of 
measurements as related to the reliability coefficient." The very 
basic assumptioa which underlies our conclusion is that the test 
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passing score must reflect in some way the true cutuff score and 
the various losses which are incorporated in the decision-making 
process. If this assumption is tenable, any comparison between 
decision accuracy and decision consistency would have no useful 
meaning if the test passing score and the true cutoff score were 
selected independent ly of each other. The counterexample pre- 
sented by Mellenbergh and van der Linden (1979, p. 263) seems to 
reflect this type of selection. In addition, the above conjecture 
appears to be contradicted by the theoretical results reported by 
Huynh (1976c, 1978a), namely the fact that under fairly general 
assumptions, the raw agreement index and the kappa index for 
decision consistency are increasing functions of the classical 
reliability. Thus, both these indices of decision consistency 
across two test administrations reflect the nature of the relation- 
ship between true scores and observed scores. 

It should be pointed out that the indices of decision accuracy 
and of decision consistency are defined for a set of test scores 
collected from the administration of a test to a group of examinees. 
Both indices thus represent internal characteristics of the data. 
As may be recalled, the decision accuracy index considered in this 
paper presumes that losses associated with incorrect decisions are 
equal (and constant); it should be replaced by appropriate effi- 
ciency indices when losses do a t have this simple form. In this 
case, the Huynh efficiency indices (Huynh, 1975, 1976b, 1980a), the 
6 index proposed by van der Linden and Mellenbergh (1978), or the 
Wilcox y index (1978) might be used. Because losses are often de- 
fined as a function of the true ability (which is typically esti- 
mated from test data), all these indices actually represent the 
internal characteristics of the data; they do not appear to be re- 
flective of any other trait which might relate to the test itself. 
Decision accuracy and other similar efficiency indices seem to act 
as counterparts of reliability in classical test theory. 

Finally, it may be noted that in many practical situations, 
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losses are very hard to assess, and loss-based coefficients may 
not be useful. For example, procedutes for setting passing scores 
are often based on an examination of the test itemp or on a con- 
sideration of the objectives underlying the test. For situations 
in which these, procedures »re appropriate, only the test passing 
score is available for the evaluation of the internal character- 
istics of the test data; hence decision consistency may very well 
be the only characteristic of the data which could feasibly be 
used to assess reliability. The argument seems convincing that 
decisions based on test data would not be acceptable if they 
could not be replicated to a satisfactory degree by use of the 
data collected from another test administration. The practical 
implications of this study seemly contradict the assertion by 
K*llenbergh and van der Linden that "decision consistency and 
reliability are not equivalent concepts" (1979, p. 270). Based 
on the results of this study, it appears that decision consistency 
acts very much like a counterpart of classical test reliability. 
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A NOTE ON DECISION-THEORETIC 
COEFFICIENTS FOR TESTS 



Huynh Huynh 
University of South Carolina 



ABSTRACT 

A modification is suggested for the decision-theoretic co- 
efficient 6 proposed by vcn der Linden and Mellenbergh. Under 
reasonable assumptions, the modified index varies from 0 to 1 in- 
clusive. It is argued that in many practical applications of 
mastery testing, coefficients such as 6 are not readily available, 
and consistency of decisions may serve as evidence of the quality 
of the decision-making process. 

1. INTRODUCTION 

Coefficients for tests (or strictly speaking, for a set of 
measurements) derived from decision theory have been formulated 
in a variety of ways (Huynh, 1975, 1976; van der Linden & 
Mellenbergh, 1978). These coefficients are based on the reduction 
in the proportion of expected loss (or Bayes risk) which would 
result from using test scores in the decision-making process. 
The efficiency coefficient proposed by Huynh is defined as e = 
(R* - R q )/R* where R q is the expected opportunity loss associated 

This paper has been distributed separately as RM 80-4, July, 1980. 
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with the best use of test scores. The denominator R* is the 
minimum of a similar loss whici would be incurred if decisions 
were based on information having no relationship to the true 
ability of the individual subject. (It may be noted that the 
opportunity losses associated with perfect information, i.e., 
when decisions are always correct, are zero.) Using the notion 
of monotone decisions along with the assumption of monotone like- 
lihood ratio for the test score density, Huynh was able to prove 
that the efficiency index e ranges between 0 and 1 inclusive. 
The lowest value 0 occurs when test information is unrelated to 
the ability of the subject, and the upper bound 1 is reached when 
test scores reveal faithfully the ability of the subject. 

The decision- theoretic coefficient proposed by van der Linden 

and Mellenbergh (1978) is defined as 6 « (R - PO/(R - R ) ♦ 

none 

where R^ represents the expected loss associated with the use of 
test scores. R^ and R n> on the other hand, are the expected 
losses for situations in which the test contains complete and no 
information about the true scores, respectively. These losses 
are not necessarily opportunity losses. As defined, the coeffi- 
cient 6 is 0 when test scores are unrelated to true ability, and 
reaches the value 1 when test scores contain complete information 
about true ability. However, as noted by van der Linden and 
Mellenbergh (1978), the coefficient 6 may not always lie within 
the interval defined by 0 and 1. To overcome this deficiency, 
Wilcox (1978) proposed that R r and R c be replaced with the upper 
and lower bounds of the expected loss R^. His index Yi then, 
will range between 0 and 1. However, it is not known if these 
bounds have direct interpretations in terms of the degree of re- 
lationship between test score and true ability. 

The purpose of f"Ms note is to modify the index 6 slightly, 
and to describe the situations in which the resulting index falls 
between 0 and 1. The assumptions are presented only for the case 
of binary (mastery versus nonmastery) classification; however, 
they may be generalized in a fairly simple manner to situations 
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involving more than two classification categories. 

2. GENERAL CONSIDERATIONS 

Consider a popvlation of subjects for whom the true ability 
6 is distributed according to the density p(9) with fi as range. 
If there is only one subject in the population, then p(9) repre- 
sents the prior density in the context of Bayesian statistics. 
Let x represent the observed test score and f(x|e) be its condi- 
tional density with the real line as the range. Let a^^ be the 
action of denying mastery status (the nonmastery category) and & 2 
be the action of granting mastery (the mastery category) . Follow- 
ing the notation used In Ferguson (1967, chapter 6), let L(9,a 1 ) 

and L(e,a.) be the losses associated with the actions a, and a . 
T 12 
In most formulations of mastery testing, it is usually assumed 

that there exists a true cutoff ability 9 q such that action a ± is 
better than action & 2 when 9 < 6 and the reverse is true when 
9 > 9 q . To be consistent with these assumptions, the losses would 
have to satisfy the following inequalities: L(9,a 1 ) <_L(9,a 2 ) 
for 9 < 9 q and LO.a^ >_ L(9,a 2 ) for 9 > 9 Q . Under these con- 
ditions, the binary decision problem involving the actions a 1 and 
&2 *- 8 said to be monotone . 

In practical situations, however, mastery/nonmastery decisions 
are usually based on observed test data. In general, it seems 
reasonable that mastery should be granted if the test score x is 
high, and nonmastery should be presumed If the test score Is low. 
In order that this type of classification be optimum in most 
decision- theoretic contexts, it is traditionally assumed that the 
conditional density f(x|9) has monotone likelihood ratio . This 
condition is fulfilled for test models involving the exponential, 
Poisson, normal, negative binomial, gamma, and beta distributions, 
and in general, distributions belonging to the one-parameter ex- 
ponential family (Ferguson, 1967, p. 208-209). In addition, the 
assumption of monotone likelihood ratio for f(x|9) implies 
(Lehmann, 1966; Dykstra, Hewett, & Thompson, 1973, p. 679, 



323 



307 



HUYNH 



definition) that x is positive likelihood ratio dependent upon 6. 
This result, in turn, implies that x and 6 are stochastically 
increasing in sequence (Dykstra et al . , Theorem 2); that is, the 
conditional distribution of x, F(x|6) is nonincreasing in 6. 
Thus, when the monotone likelihood ratio assumption is fulfilled, 
the probability that a subject achieves a test score of x or lower 
is greater for subjects with lower ability. 

When f(x|e) has monotone likelihood ratio, it is best to de- 
clare mastery if the test score x is at least c,, and declare non- 
mastery if the test score x is smaller than c. The expected loss 
(or Bayes risk) associated with the cutoff test score c is 

R = J n L 1 (e,a 1 )f(x|e) P (e)dxd6 
+ Jq J" L 2 (e,a 2 )f(x|e)p(e)dxde, 

or 

R = J n L 1 (6,a 1 ) Pr(x<c|6)p(e)d0 

(1) 

+ J n L 2 (e,a 2 ) Pr(x>c|e)p(e)de. 

Consider now the first extreme case where x carries no in- 
formation about 6, i.e., when x and 9 are independent. For this 
situation, the two probabilities Pr(x<c|e) and Pr(x>c|e) are free 
of 6, and the expected loss may be written as 

R = C/ n L 1 (e,a 1 )d6]Pr(x<c) 

0 1 1 (2) 

+ [J n L 2 (e,a 2 )de:iPt(x>c). 

The relationship between R and R n may be stated as follows. 
Th eorem 1 . Let L^(8 f a^) be nondecreasing in 6 and L 2 (6,a 2 ) be 
nonincreasing in 6. In addition, let f(x|e) have monotone like- 
lihood ratio. Then R < R . 

— n 

Proof . Equation (1) may be written as 

-R « E e [-L 1 (e,a 1 ) Pr(x<c| 8) J 

+ E e rL 2 (6,a 2 ) {-Pr(x>c|6}]. 
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All the functions -1^(6^), Pr(x<c|e), L 2 (6,a 2 ), and -Pr(x>c|e) 
are nonlncreaslng In 6, hence (Dykstra et al . , 1973, p. 678) 

-R > - [E^O^)] E e Pr(x<c|6) 

- CE e L 2 (6 f a 2 )] E Q Pr(x>c|e), 

or 

-R > -R . Q.E.D. 
~ n 

The assumptions regarding the variations of 1^(6^) and 
L 2 (9,a 2 ) with respect to a 1 and a ? seem Intuitively justified. 
The denial of mastery status probably should cause less harm to a 
subject with lower ability than to someone with higher ability. 
Granting mastery status, on the other hand, should entail lesser 
consequences for a high ability subject than to someone with 
lower ability. 

Consider now the second extreme case where the test score x 
reveals fully the ability 6 of the subject. It appears rea- 
sonable to Impose a strictly Increasing function relating x to 6. 
Leu 6 c be the Image of the test cutoff score c on the true ability 
scale 6. Then It may be deduced that P(x<c|6) » 1 when 6 < 6 and 

0 when 6 > 6 . On the other hand, P(x>c|e) « 0 when 6 < 6 and 1 

c c 
otherwise. Thus, under the assumption of complete Information, 

the expected loss as expressed In (1) will be equal to 

e 

Lt L 1 (e,a 1 ) P (e)de + j£° L 2 (e,a 2 ) P (e)de. 

c 

Under the monotone-decision conditions Imposed previously on the 
loss functions, It may be shown that this loss Is minimized when 
e c " e 0 * Hence the minimum complete-Information expected loss may 
be taken as 

0 

R c = /-« V e,a i )p(e)de + /e V^Vp* 9 ^ 9 - (3) 

o 

Theorem 2 . Under the monotone-decision assumptions, the expected 

loss R, computed at any test cutoff score, and the minimum 

complete-Information expected loss, R c , satisfy the Inequality 

R < R. 
c — 
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Proof. Consider the expected loss R of (1) which can be written 
as A 

o 9 

R ° /.a, Pr(e<c|e)p(e)de + J_l L 2 (e,a 2 ) Pr(e>-|e) P (e)de 

+ /g L 1 (e,a 1 ) pr(e<c|e)p(e)de + /g L 2 (e,a 2 ) ^r(e>c|e)p(e)de. 

o o 

When 9 < 0 o , L 1 (6,a 1 ) < L 2 (6,a 2 ) and when 6 > L^S.a^ < 

L^e^). By noting that Pr(x<c|e) + Pr(x>c|e) * 1, it may then 

be verified that R > R . Q.E.D. 

— c x 

The following corollary is immediate. 

Corollary . Let the loss L^a^S) be nondecreasing in 6, the loss 

L 2 (a 2l 9) be nonincreasing in 6, and let the graphs of these 

functions cross at a given point * ithin the positive-probability 

range of 6. In addition, let f(x|e) have monotone likelihood ratio. 

Then the index 6 = (R - R )/(R - R ) in which R is the minimum 

c n c c 

complete information expected loss will be between 0 and 1 in- 
clusive. 

3. RATIONALE FOR THE USE OF MINIMUM EXPECTED LOSS 

The use of the minimum expected loss for the case of a strict- 
ly increasing relationship between x and 6 guards against the seem- 
ing contradiction in which the use of perfectly reliable test data 
would cause more harm than the use of less-than-perfectly reliable 
test data. 

The bounds R r and R c for the expected loss R have fairly 
straight-forward psychometric interpretations. The lower limit 
R n would occur if nonmastery and mastery status were randomly 
assigned to examinees regardless of the test scores, keeping the 
proportion of nonmasters equal to that of examinees having test 
scores smaller than c, and the proportion of masters equal to that 
of examinees having a test score of c or greater. The upper limit 
R c corresponds to the best use of completely reliable test data. 

It may be noted that both bounds (R^ and R ) are easy to com- 
pute, given the quantities p(6), f(x|e), L^S^) and L 2 (6,a 2 ). 
Thus, the index & as defined in this note may be estimated in a 
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fairly straight-forward manner for most situations involving the 
use of test data to make decisions. This represents an advantage 
over the Wilcox y (Wilcox, 1978, p. 610) which seems to involve 
rather complex calculations. 

A. SOME ADDITIONAL REMARKS 

As additional remarks regarding the index 6 proposed by van 
der Linden and Mellenbergh (1978), some departures appear apparent 
between its formulation and the various illustrations. The 
authors argued that their index 6 seemed more realistic than the 
coefficient e defined in Huynh (1976) because 6 was defined on 
any chosen cutoff score while the e index relied on the optimum 
cutoff score. But, in both illustrations based on squared-error 
and linear losses;, the optimum cutoff score was us*. i in order to 
reach the conclusion that the 6 index was equal to the classical 
reliability index. In addition, S was presented as a coefficient 
that represented the optimally of decisions (p. 133). Thus the 
use of a less-than-optimal cutoff score in the formulation of 6 
seemed to contradict the very characteristic which 6 was thought 
to embrace. 

Finally, the use of any decision-theoretic coefficient for 
tests presumes the availability of the losses (or utilities) 
associated with the various actions. In a number of practical 
situations, however, decisions regarding cutoff scores are not 
based on losses because they are not readily quantified or be- 
cause the decision-maker is not willing to use them. In many in- 
stances, for example, cutoff scores are derived from an exami- 
nation of item content or a consideration of the educational 
objectives. For these cases, the decision-theoretic coefficients 
as described in this paper are not available and the consistency 
of various decisions across two test administrations may serve as 
evidence of the quality of the decision-making process. It may 
be argued that decisions regarding success or failure for each 
subject may not be acceptable if they cannot be replicated to a 
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reasonable extent on a second test administration. It is 
cautioned, of course, that test-retest consistency for decisions 
does not necessarily imply that the corresponding decisions are 
reflective of the purposes that the decision-maker nas in mind. 
This line of reasoning is reminiscent of the well-accepted fact 
that in measurement, reliability is a necessary but not a 
sufficient condition for validity. 
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ASSESSING EFFICIENCY OF DECISIONS 
IN MASTERY TESTING 



Huynh Huynh 
University of South Carolina 



ABSTRACT 

Two indices are proposed for assessing the efficiency of 
decisions in mastery testing. The indices are generalizations of 
the raw agreement index and the kappa index. Both express the 
reduction in the proportion of average loss (or the fain in utility) 
resulting from the use of test scores to make decisions. Empirical 
data are presented which show little discrepancy between estimates 
based on the beta-binomial and compound binomial models for one 
index. 



1. INTRODUCTION 

A primary purpose of mastery testing is to classify examinees 
in several achievement or ability categories. Typically, there are 
two such categories, which are often referred to as mastery (reau>, 
competent, or instructed) and nonmastery (nonready, incompetent, or 
uninstructed) groups. Ideally, these categories are defined on the 
basis of the true ability (6) of the subjects; however, in reality, 

This paper has been distributed separately as RM 80-5, July, 1980. 
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observed test scores are used to make mastery/nonmastery decisions. 
Since observed test data are often fallible, decisions based there- 
upon are less than completely accurate or efficient. 

In the simplest formulation of mastery testing (Hambleton & 
Novick, 1973; Huynh, 1976a), the categories of true mastery and 
true nonmastery are defined respectively by the conditions 6 >_ 6 q 
and 6 < 0^, 0^ being a constant referred to as a criterion level by 
Hambleton and Novick and a true mastery score by Huynh. A test is 
given, and the observed test score x is obtained for each individual 
examinee. A suitable test passing (cutoff, mastery) score c will 
be chosen, and ttr- examinee will be granted or denied mastery status 
if the observed test score x is such that x >_ c or x < c. The two 
combinations (0 < 0 q ; x < c) and (0 >_ 0^; x ^ c) represent correct 
decisions; they entail no (opportunity) losses in the decision 
process. The other two possible combinations correspond to a false 
positive error (0 < 0 q ; x c) and a false negative error (0 ^_ 0 q ; 
x < c). Some form of loss function, such as constant, linear, or 
squared error loss, is typically assigned to each of these errors 
in most decision-theoretic formulations of mastery testing 
(Hambleton & Novick, 1973; Huynh, 1976a, 1980b; van der Linden <i 
Mellenbergh, 1977). 

Given various parameters defining the decision situation (such 
as 0 q ; the numDer of test items; the losses incurred by misclassifi- 
cation; and, when available, prior information regarding the indi- 
vidual examinee or the group of examinees), a test passing scoie 
may be determined by minimizing either the average loss (Bayesian 
or empirical Bayes passing score) or the maximum loss (minimax 
passing score). For example, where classification errors are 
weighted equally (e.g., when the false positive loss and the false 
negative loss are identical), an optimum passing score may be deter- 
mined by minimizing the sum of the probabilities of making such 
errors. Details regarding the determination of passing scores may 
be found in Huynh (1976a, 1980b). 

Once a passing score has been set for a test, an obvious ques- 
tion concerns the extent to which the test itself contributes to 
the quality of the decision-making process. The question may be 
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answered in a variety of w&: For example, if the test scores are 
used to identify students who need instructional remediation, then 
the detection of poor achievers (nonmasters) is important, and 
therefore a substantial false positive error rate may not be 
acceptable. In this context, a mastery test may be considered as 
effective or efficient if it yields a small false positive error 
rate. In most situations, however, some combination of false posi- 
tive error, false negative error, and their corresponding losses 
would be desirable in assessing the efficiency of using test scores 
to make decisions regarding individual examinees. 

2. REVIEW OF LITERATURE 

The consideration of decision efficiency was introduced by 
Huynh (1975, 1976c) for: the case involving constant losses. Let R 

o 

be the expected loss associated with the best use of test data and 
* 

R min be the smallest expected loss encountered in the case of no 
relationship between true ability and test score. Huynh' s effi- 
ciency coefficient, defined as e « 1-R /R* , was interpreted as 

o min 

the proportion of reduction in random loss which would result from 
the best use of test data in the decision-making process. Under 
fairly general conditions regarding the nature of test data, Huynh 
proved that e was included between 0 and 1. The lower bound occurs 
when there is no relationship between test score and true ability; 
the upper bound is reached when there is a perfect increasing rela- 
tionship between these two variables. 

The concept of decision efficiency was later extended under a 
slightly different form by van der Linden and Mellenbergh (1978) 
and Mellenbergh and van der Linden (1979). These writers proposed 

the use of the coefficient 6 - (R -Rj/(R -R ), which may be written 

n B n c 

equivalently as 6 = 1 - (R B ~V /(R n~V ' r orm slmllar to Huynh's 
original e. In these formulae, represents the expected loss 

associated with any predetermined test passing score; R and R are 

c n 

the expected losses encountered in situations in which the test 
scores contain complete and no information about the true score, 
respectively. As shown by var der Linden and Mellenburgh, there is 
a direct relationship between 6 and the classical reliability index 
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when 6 is computed for linear losses at the optimum test passing 
score . In addition, the two special values 6 » 0 and 6 ■ 1 have 
the same meaning as e. However, van der Linden and Mellenbergh 
correctly stated that their proposed 6 may not always be included 
between 0 and 1, as would be typically desirable in the formulation 
of indices to be used in educational and psychological measurement. 
Huynh (1980c) proposed a revised 6 in which K Q represented the 
expected loss associated with the best use of completely infallible 
data and proved that 0 £ 6 £ 1 under fairly general conditions. 
Wilcox (1978) had also advanced a modification of 6; his index y 
ranged between 0 and 1. However, these boundary values of y did 
not appear to bear direct interpretations in terms of the relation- 
ship between test scores and true ability. 

Livingston and Wingersky (1979) proposed the assessment of the 
quality of pass/fail decisions (mastery testing) on the basis of 
the probabilities of making correct and incorrect decisions and on 
the basis of an efficiency index involving these probabilities and 
the corresponding utilities. The issue of errors in decisions has 
been considered at length in the literature (Hambleton & Novick, 
1973; Huynh, 1976a; Wilcox, 1977). In addition, the Livingston^ 
Wingersky index varies from -1 to +1, a range which often compli- 
cates the interpretation of the index. Estimates for the various 
quantities considered by these authors are based on the compound 
binomial model, which typically requires the responses of at least 
1000 examinees. The requirement seems quite stringent in m&ny 
cases involving field testing or the use of mastery tests. (Actual- 
ly, as can be seen later in this paper, the Livings ton-Winger sky 
index relates directly to the raw efficiency index t^; there is 
little difference between estimates of e 2 based on the compound 
binomial and beta-binomial models.) 

The purpose of this paper is to provide a general formulation 
of decision efficiency in mastery testing, to provide illustrations 
based on the beta-binomial model, to describe ways to estimate the 
proposed efficiency indices, and to report data comparing estimates 
based on the compound binomial and beta-binomial models. 
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Figure I provides the motivation for the general formulation 
of decision efficiency as presented in the subsequent section. Let 
us consider the simplest case in which the losses encountered by 
both the false positive aud false negative errors are constant and 
equal (and are set at Q). With the cell probabilities p^ as pre- 
viously defined, the expected loss (Bayes risk) in using test 
scores to make decisions is equal to 

R - Q(p 01 + o l0 ) • (l) 

Let us presume now that there is no relationship between ability 
and test score x, hence mastery/nonmastery decisions are based on a 
random process independent of the examinee's ability. For this 
situation, the loss is expected to be 

R e' Q(p .l P 0. +P .0 P 3. ) « (2) 

This quantity will be referred to as Landom-decision risk. In 

addition, over all possible values for 8 and c, the worst decision 

o 

would occur when a true master is always denied mastery status and 
a true nonmastev is always granted mastery status. For these ex- 
treme situations, the risk stands o the maximum R = Q. Under 

m 

fairly general conditions (see Section 3), it may be verified that 

R < R . 
— e 

From the three expected losses R, R , and R , two efficiency 

em ' 

indices may be formulated. First, R^ - R represents the amount of 
reduction in the random-decision risk which could be achieved by 

sing test data. Hence, an index of decision efficiency may be 
defined via the ratio 

z l « (R e - R)/R e (3) 

which is the extent to which the reliance on test scores will reduce 
the expected loss which would be encountered if no test data (or 
completely fallible data) were used in the decision situation de- 
fined by 6 and c. From Equations (1) and (2), it may be deduced 
that 

f - (P-P )/(l-P ) 
X c c 

where p = P Q0 + P u and P c - p Q p Q + P x P r This index, is 
actuaLly the kappa index proposed by Cohen (1960) and studied 
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Figure I provides the motivation for the general formulation 
of decision efficiency as presented in the subsequent section. Let 
us consider the simplest case in which the losses encountered by 
both the false positive aud false negative errors are constant and 
equal (and are set at Q). With the cell probabilities p^ as pre- 
viously defined, the expected loss (Bayes risk) in using test 
scores to make decisions is equal to 

R - Q(p 01 + o l0 ) • (l) 

Let us presume now that there is no relationship between ability 
and test score x, hence mastery/nonmastery decisions are based on a 
random process independent of the examinee's ability. For this 
situation, the loss is expected to be 

R e' Q(p .l P 0. +P .0 P 3. ) « (2) 

This quantity will be referred to as Landom-decision risk. In 

addition, over all possible values for 8 and c, the worst decision 

o 

would occur when a true master is always denied mastery status and 
a true nonmastev is always granted mastery status. For these ex- 
treme situations, the risk stands o the maximum R = Q. Under 

m 

fairly general conditions (see Section 3), it may be verified that 

R < R . 
— e 

From the three expected losses R, R , and R , two efficiency 

em ' 

indices may be formulated. First, R^ - R represents the amount of 
reduction in the random-decision risk which could be achieved by 

sing test data. Hence, an index of decision efficiency may be 
defined via the ratio 

z l « (R e - R)/R e (3) 

which is the extent to which the reliance on test scores will reduce 
the expected loss which would be encountered if no test data (or 
completely fallible data) were used in the decision situation de- 
fined by 6 and c. From Equations (1) and (2), it may be deduced 
that 

f - (P-P )/(l-P ) 
X c c 

where p = P Q0 + P u and P c - p Q p Q + P x P r This index, is 
actuaLly the kappa index proposed by Cohen (1960) and studied 
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extensively in the context of mastery testing by Swaminathan, 
Hambleton, and Algina (1975) end Huynh (1976b, 1978, 1979a). 

A second efficiency index may also be formulated, using R and 



This index represents the extent to which the use of test scores 
will reduce the maximum risk which is common to all situations. 
From Equation (1), it may be verified that 



decision. In the context of reliability of mastery tests, (or 
P) is often referred to as the raw agreement index (Subkoviak, 1976; 
Huynh, 1979a). 

With the rationale for and e 2 as stated, a general formula- 
tion of decision efficiency will now be presented. 

4. A GENERAL FORMULATION OF DECISION EFFICIENCY 

Let 6 be the true ability of a given examinee and ft be its 
range. For the binomial error model (Lord & Novick, 1968, ch. 23), 
9 may be taken as the proportion of items in a large item pool that 
the examinee is expected to answer correctly, and the range is 
the interval [0,13. Let x be the test score observed for the exami- 
nee, and let x be distributed according to the conditional density 
f(x|8). In addition, let p(6) be the density of 6. 

A referral task (Huynh, 1976a) is assumed to exist and is used 
as an external criterion for the determination of a passing score. 
The task is defined operationally via a nondecreasing function s(6) 
which describes the probability that an examinee with true ability 
6 ./ill succeed in completing the task. As noted in the author's 
previous writing (Huynh, 1976a, 1980b), the referral task may be 
real or hypothetical. For example, in individualized instructional 
programs where a student proceeds from one content unit to the next 
(presumibly more complex) unit, each succeeding unit may serve as a 
referral task for the previous unit. In other situations, where no 
hierarchy can be logically or empirically assumed to hold, a 



R 



m* 



It is 



s * < R m - R >/ R m ■ 

z m m 



(A) 
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consensus on what constitutes an acceptable level of performance 
may be translated Into a hypothetical referral task. To be spe- 
cific, let us suppose that there exists a constant 0 q such that 
mastery is equivalent to the condition 0 ^> 0 q and nonmastery is 
described by the inequality 0 < 0 q . The corresponding referral 
task is operationally defined by the nonincreasiag function 

s(0) - 0 for 0 < 0 and s(0) - 1 for 0 > 0 . 
' o ' — o 

On the basis of the observed test score x and by relying on a 

decision rule c, the examinee will be classified in the mastery 

status (action a^) or in the nonmastery status (action a^)« Let 

C f (0) be the opportunity loss incurred in granting mastery status 

to an examinee who will eventually fail to perform the referral 

task (a false positive error). Likewise, let C (0) be the loss 

s 

associated with the denial of mastery to someone who will succeed 
in completing the task (a false negative error) . In most practical 
situations, action a^ is taken when x c, and action is taken 
where x < c. Here, the constant c is referred to as a test passing 
(cutoff, mastery) score. 

Within the decision framework as stated, the expected loss 
(Bayes risk) associated with the passing score c is given as 

r- j fi c s (e)s(e)Pr(x<c|e)p(6)de+ / Q c f (e)(i-s(e))pr(x>c|e) P (e)de. (5) 

When the test score x is discrete, the integration sign in each of 
the two terms on the right side of (5) is to be replaced by the 
summation (I) sign. For the special 0-1 form for s(0) as defined 
previously, the Bayes risk is given as 



In both Equations (5) and (6), the two separate terms on the right 
define the individual Bayes risk for the false negative error and 
the false positive error. 

Consider now the situation where test data do not reflect the 
ability of the examinees and therefore are useless in the decision- 
making process. For such a case, there would be no relationship 
between ability 0 and test score x; in other words, 0 and x would be 
independent of each other. The expected loss may now be written as 



R - /" C (0)Pr(x<c|0)p(0)d0 + / C f (6)Pr(x>c| 0)p(0)d0 . 



(6) 



o 
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R e m V e)s(e)p(8)d8 ) Pr(x<c) 

+ [f Q C f (e){l-3(8)}p(8)de)Pr(x>c) , (7) 
and, for the special 0-1 case for s(6), as 

R e * ^6 C 8 ( e )P( 0 )<i e )Pr(x<c) + (/J C f (6)p(e)de)Pr (x>c) . (8) 
o o 

Let p =* Pr(x^c) so that 1-p » Pr(x<c). Then for the situation in 

which no relationship exists between x and 6, the decision process 

is carried out by randomly assigning individuals to mastery and 

nonmastery categories according to the proportions p and 1-p, 

respectively. As in the previous section, the Bayes risk R will 

e 

be referred to as the random-decision risk, or simply, random risk . 

It may be verified from Equation (5) that the Bayes risk R 
cannot exceed the quantity 

R m = ; n c s ( e)s(e)p(e)de + / fl c f (e) (i-s(e)) P (e)de . (9) 

This risk is encountered when mastery/nonmastery decisions based on 
test data are always incorrect, that is, a true master is always 
denied mastery status and a true nonmaster is always granted mastery 
status* 

With the three risks R, R g , and as defined, the two decision 
efficiency indices e 1 and e 2 may now be written as 

e l " !-■/*. (10) 

and 

£ 2 = ***** ' (ID 
Since e 1 is a generalization of the corrected-f or-chance kappa index, 
it seems appropriate to refer to it as the corrected-f or-chan ce 
efficiency index. Likewise, with e 2 as a general case of the raw 
agreement index, it may be referred to as the raw efficiency index . 

Just as in the case of kappa and P, there are fundamental 
differences between t ± and The e 2 index is formulated on the 

basis of the baseline risk R m which expresses the worst possible 
risk which could occur in the decision-making process. This risk 
is incurred when decisions regarding mastery/nonmastery are always 
incorrect. Thus e 2 equals 1 when decisions are always correct and 
reaches the minimum 0 where decisions are always incorrect. 
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On the other hand, e, assumes the random risk R to be the 

1 e 

baseline risk and expresses the extent to which the use of test 
scores will reduce this random risk. As is the case of kappa, 
reveals the magnitude by which the test scores will improve the 
effectiveness of the decision-making process beyond the level which 
could be expected from random classification. », random assign- 
ment of examinees to the mastery and nonmastery categories, however, 
keeps intact the proportions of masters and of nonmasters as defined 
by the observed test score frequencies.) Thus attains the maxi- 
mum value of 1 when decisions are always correct. It will be equal 
to zero when the decision-making process is carried out by random 
classification (i.e., when test scores have no relationship with 
the ability of the examinees) . 

It should be clear from the above elaboration that decision 
efficiency depends not only on the characteristics of the test (as 
reflected jn the dependency between x and 6), but also on the par- 
ticular circumstances under which the test scores are used to make 
decisions regarding the individual examinees. Such circumstances 
are reflected in the referral success function s(9), the two loss 
functions C (6) and C f (6), and the prior or group ability density 

S I 

p(e). 

To complete this section, it may be noted that under all 
circumstances 0 ^ t <_ 1 and e ^ ~ e 2" * n a ^^ t ^ on > si nce the 
referral success function s(8) enters in the definition of R and 

R , but not in that of R , it is expected that s(6) will have more 
e m 

influ nee on £^ than on Thus, in the simplest formulation of 

mastery testing which involves the true mastery score 6 q , this 
score 0 will probably have more bearing on e than on e 0 . 

O x / 

5. CONDITIONS UNDER WHICH c x IS POSITIVE 

In the most general situation, c± may be negative. This 
section will describe the conditions under which this index is 
positive. 

From the definition of losses presented at the beginning of 
Section 3, it seems reasonable to assume that both s(6) and C~(6) 
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are nondecreasing and that C g (9) is nonincreasing. In fact, if the 
referral task is chosen appropriately, then examinees of higher 
ability should be more likely to succeed in performing the task than 
those of low ability. In addition, the denial of mastery status 
should cause less harm for subjects with low ability than for those 
with high ability. Likewise, granting mastery status to a low 
ability examinee would cause more harm than granting mastery to a 
high ability examinee. Thus, it seems sensible to assume that 
C g (e)s(9) is nondecreasing with respect to 9 and that C f (9) (l- s (9)) 
is nonincreasing with respect to 9. 

Now let us focus on the relationship between ability 9 and 
test score x. If the test is reasonably well constructed, then the 
probability Pr(x<c|9) is nonincreasing in its argument 9. In other 
words, examinees with low ability are more likely to get low test 
scores than those with high ability. This assumption is tenable if 
the density f(x|9) belongs to the monotone likelihood ratio (Esary, 
Proschan, & Walkup, 1967; Dykstra, Hewett, & Thompson, 1973). It 
follows from Theorem 1 of Dykstra et^ al. that 

f a C s ( 8 ) s ( e )Mx<c|9)p(9)d9 

< (/ n C g (9)s(e)p(9)de)(/ n Pr(x<c|9)p(9)d9) . (12) 

The last integral is simply the unconditional probability Pr(x<c). 
By using the same theorem, it may be verified that 

/ tt C f (9)(l. s (9))pr(x>c|9) P (9)d9< {^C f (9)(l. s (e))p(G)d9}Pr(x>c). (13) 

It follows that, at each test passing score c, R< R , and hence 
0 < e < 1. 

6. AN ILLUSTRATION BASED ON THE BETA-BINOMIAL MODEL 
WITH CONSTANT LOSSES AND 0-1 REFERRAL SUCCESS 

Consider now the simple case in which the test score x obtained 
from the administration of an n-item test to a subject with ability 
9 is distributed according to the binomial density 

f(x|9) » (£)9 x (l-9) n - x , x - 0,l.....n. (U) 

In addition, let it be assumed that the subject comes from a popula- 
tion of examinees for whom the ability 9 is distributed according to 
the beta density 
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p(6) - 8 0 " 1 (1-6) 8-1 , 0 < 6 < 1 . (15) 

Then the unconditional distribution of the test score x is defined 
by the negative hypergeometric density 

(^Bta+x.n-x+B) 

f(x) ■ BC.H • (16) 

Let 6 q be the minimum passing level in the ability continuum, 

and let C r (9) » 1 and C (6) * Q. In other words, Q is the ratio of 
t s 

the constant loss due to a false negative error to the one produced 

by a false positive error. The two Bayes risks R and R e may now be 
computed via the following formulae: 

R - Pr(8<8 o ,x>c) + Q Pr(e:>e o ,x<c-l) (17) 



and 



R * Pr(8<8 )Pr(x>c) + Q Pr(6>6 )Pr(x<c-l). (18) 
e o ~~ — o — 



The two probabilities listed in (17) may be obtained from tables of 
the incomplete beta function (Pearson, 1934), by use of the formu- 
lae presented in Huynh (1976a, p. 71), or from tables and a computer 
program documented in Huynh (1979b, 1980a). The two probabilities 
in Equation (18), on the other hand, may be secured by applications 
of the inductive formulae reported in Huynh (1976b). It may also 

be noted that R 88 Pr(e<6 ) + Q Pr(6>6 ). 

m o — o 

N umerical Example 1 

Consider the situation in which a 10-item test is administered 

to a group of examinees and the resulting test scores have a mean 

of y = 7.00 and a KR21 index of a^i 38 From the formulae in 

Huynh (1976a), it may be deduced that the parameters defining the 

beta true ability are o - (-1 + I/ol^m » 10.5 and 0 = -a + n/a 21 -n 

* 4.5. Let 0 = .60, c = 8, and Q - .50. Then, by using the 
o 

tables reported in Huynh (1979b), the rates of false positive error 
and of false negative error may be found to be 



and 



Pr(e<0 o ,x>c) * .0173 



Pr(6>0 ,x<c) * .3955 . 
~- o 



Hence the Bayes risk in using the test scores to make decisions is 
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R - .0173 + .50 x .3955 . .2151. On the other hand, Pr(x<c) - .5713 
and Pr(e<e Q ) « ,1931, and hence R g - .1931 * .4287 + .50 x .8069 
x .5713 - .3133. In addition, R ffl - .1931 + .50 x .8069 - .5966. 
The decision efficiency indices are e ± - 1-. 1931/. 3133 » .384 and 
e 2 - 1-. 2151/. 5966 - .639 

7. DECISION EFFICIENCY FOR THE BETA-BINOMIAL MODEL 
WITH POWER LOSSES AND 0-1 REFERRAL SUCCESS 

Consider now the beta-binomial model along with the special 
0-1 referral success and the losses defined by 

c f (e) - (e -e) for e < e (19) 

1 o o 

= o for e > e 

- o 

and 

c (9) - Q(e-e ) L for e > e 

8 o' - o (20) 

=o for e < e . 

o 

Then, ipart from the denominator B(a,0), the Bayes risk at the test 
passing score c is given as 

R = Q /J (0-0 yV-^l-e)*" 1 E ( n )0 x (l-0) n - x d 0 (21) 
o x=C x 



+ 'o° tto-^^a-e)*' 1 " ( n )0 x (i-0) n_x d0 



X 

x»c 



Similarly, apart from the denominator B(a,0), the random-decision 
Bayes risk is given as 



(22) 



R e - Q(/J (e-6 o ) P V- 1 (i-e) 6 - 1 d9) [? f ( x )) 

O X .Q 

+ lf 0 ° (e.W" 1 ( i-e) B - 1 de) (J c f(x)) , 

and the maximum risk as 

R m ' Q / 0 (e ~ e ) •°" 1 (l-e) B " 1 d8 
o ° 

+ / o° (e o _e) e °(i-e) e de . (23) 

When (or p 2 ) is an integer such as in the case of linear or 
quadratic, losses, the integrals in (21), (22), and (23) which 
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involve p^ (or p^) may be compufed via the incomplete beta function 
(Pearson, 1934) and the recurrence formula described as follows. Let 

e 

» 11— 1 . _ . 17— 1 

(24) 



D(u,v;8 o ) « / 0 °r U " 1 (l-t) V " 1 dt 



Then 



B(u,v)I(u,/;e o ) . 



D(u*-l,v-l;6 ) - (-e u (l-9 ) v_1 + U D(u,v,e ))/(v-l) . 



(25) 



The computations for R, R , and R are simplified considerably 

e m 

when losses are of the linear form. The Bayes risk R of Equation (21) 
may now be written as 



R-B^y/J (e^iV-W-Ve) 6 - 1 ) z ( n )e x (i-e) n - x de 
+ I (^C( 9 0 9a " 1 ( 1 - 9 ) 6 " 1 - ea+1 ' 1 (i- 9 ) 6 " 1 ) * Oe x (i-e) n " x de. 



Let F (n,a,tf,6 ,c) and F (n,a,0,0 ,c) denote the false negative and 
no p o 

false positive error rai.es associated with the beta true ability 
distribution with parameters a and 3. By noting that 

B (a+1 B) = r < a +l> r (g> - ar (cOr(B) . aB(a,B) 
,p; r(a+B+l) (a+B) T (a+B) a+B ' 

it may be verified that the Bayes risk R is given as 

R = ^feSa F (n,a+l,B,e c) - 8 F (n,o,B,9 ,c)) (26) 



V a+B n 
+ 6 F (n.a.B.e .c) 



o' on 
a 



a+B yn."+l.M 0 .c) • 



op ' ' ' o* 

Formulae, tables, and a computer program are available (Huynh, 1979a, 
1980a) for the computation of the false positive and false negative 
error rates. 

As for f* e and R m , they may be expressed via the incomplete beta 
function as follows: 

^-Q^U-Ha+l.B-.e^) -e o (l-l(a,B;e o ))} (27) 

) )- — 

o' a+B 

and 



rc-1 1 




I f(x) 


+ f 


ix-O J 


V 





' n 
£ f(x) 
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' a 1 
R " Q fe+B C1 " I(a+1 ' 8;e o )] ' e 0 Cl-I(a,e;e o ) 3| 



(28) 



Numerical Example 2 

For the basic data described in the first numerical example, 

the use of linear losses (? 1 ■ p ■ 1) w in result in the Bayes 

risks R = .02165, R - .03865, and R - .07118. Hence the values 

e m 

of the efficiency indices are e 1 « 1- .02165/ .03865 - .440 and 
e 2 « 1-. 02165/. 07118 « .696. 



Recently, Livingston and Wingersky (1979) proposed an index of 

efficiency for situations in which the consequences of granting or 

denying mastery status are expressed in terms of utility. For the 

simplest case involving linear and opposite utility, the utility of 

granting mastery status is 8-0^ and the utility of denying mastery 

status is 6 -0. Here 6 is the true ability of the examinee, and 6 
o o 

is a given constant. As before, let x be the observed test score 
and c be the test passing score. The efficiency index proposed by 
Livingston and Wingersky (1979) is the ratio 



where the summation sign (E) is extended over all examinees. This 
index reaches the maximum value of 1 when decisions based on test 
data are always correct and the minimum value of -1 when these 
decisions are always incorrect. 

We will show that a linear relationship exists between the 
Livingston-Wingersky efficiency index e and the raw efficiency 
index computed from the corresponding (opportunity) loss func- 
tions. These loss functions are expressed as 



8> RELATIONSHIP BETWEEN e ? AND THE 
LIVINGSTON-WINGERSKY EFFICIENCY INDEX 



E(e-6 o )sign(x-c) 



e = 




(29) 



C f (6) « 2(6 -6) for 



0 < 0 



O 



« 0 



for 



0 > e 



o 



and 
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C (9) - 2(9-9 ) for 9 > 0 
s o — o 

- 0 for 9 < 9 . 

o 

Then the raw efficiency index * 8 given as 
Z I (9-9 ) + Z Z (9 -9) 
e l e 0 *i c ° 9<e x<c ° 

e 2 " El 9-9 I • (30) 

1 o' 

With the losses as defined, it will now be shown that e « Ze^-l- 
In fact, apart from the denominator e|9-9 q |, the quantity Ze^"! is 
equal to 

2 1 Z (9-9 ) + 2 Z Z (9 -9) - 

9>9 x>c ° 9<9 x<c ° 
— o — o 



Z Z (9-9 ) + Z Z (9-9 ) + Z Z (9 -9)+ Z Z (9 -9) 

9>0 x>c ° 9>9 x<c ° 9<9 x<c ° 6<9 x>c ° 
— o — — o o — o — 



Z 

x>c 



+ Z (9-9 )] - Z f I (9-9 ) + Z (9-9 )1 
9<9 ° J x<c [9>9 ° x>9 ° J 



z (e-e ) 
e>e ° 

— o o * v — o 



» E(9-9 )sign(x-c) . 
o 

This quantity defines the numerator of the Livings ton-Winger sky 
efficiency index. Thus the relationship e * 2£ 2 " 1 holds for linear 
and opposite utilities. For other opposite utilities which define 
the Livingston-Wingersky general index of efficiency, and with the 
corresponding (opportunity) loss functions, it may also be verified 
that the same relationship will hold. 

As a passing remark to end this section, it may be noted that 
Livingston and Winger sky (1979, p. 258) appear to imply that "if 
examinees' chances of passing the test were completely unrelated 
to their trua scores, the efficiency index would have an expected 
value of zero. 11 T is assertion regarding e apparently is not 
complete, as may be seen from the following argument. If there is 
complete independence between true ability 9 and observed score x, 
then it may be verified that at each given pair (8 ,c), the numera- 
tor of e in (26) is given as 

Z(0-0 o )sign(x-c) « (E(9-9 o ))Pr(x>c) - (z(8-8 o ))Pr (x<c) . 

Hence, when i!(9-9 Q ) 4 0, e is 0 if and only if the test passing 
score c is set up such that half of the subjects will pass and the 
other half will fail. (This observation also holds for situations 

344 

Q ■) ^ 



EFFICIENCY OF DECISIONS 



in wh^ch the action of granting mastery and the action of denying 
mastery have opposite utilities other than opposite linear ones.) 

9. ESTIMATION PROCEDURES BASED ON THE BETA-BINOMIAL 
AND CO MPOUND BINOMIAL ERROR MODEL S 

The estimation of the decision efficiency indices c l and e 2 
may be carried out on the basis of the observed test data if rea- 
sonable assumptions can be made regarding the functional forms of 
the conditional probability Pr(x<c|e) and of the density p(6) of 
che true ability. 

Wh^n the beta-binomial error model (Lord & Novick, 1968, 
ch. 23) is appropriate, the estimation of decision efficiency under 
constant or power losses may be carried out via the formulae de- 
scribed in Sections 6 and 7. In using these formulae, the param- 
eters a and g of the beta distribution are to be replaced by their 
corresponding estimates based on sample data. A commonly used set 
of estimates is the moment estimates which are obtained as follows. 
Let x and s be the mean and standard deviation of the test scores, 
and let the ICU21 reliability be defined as 

x(n-x) 

(31) 



1 - 



2 
ns 



Then the moment estimates of a and 3 are given as 
and 



a - (-1 + l/a 21 )x (3 



3 - -a + n/a 21 - n . (33) 

While the beta-binomial model has been found to fit several 
test score d ."ributions reasonably well (Keats & Lord, 1962; 
Duncan, 1974), and to provide useful results in mastery testing 
(Huynh, 1976a, 1976b, 1977, 1979, 1980a), che compound binomial 
error model (Lord, 1965, 1969) has been advocated as 1 more real- 
istic model for the description of actual test data. Livingston 
and Wingersky (1979) used the latter model to obtain estimates for 
the false positive and false negative error rates, estimatej for 
decision accuracy (proportir- of examinees who are correctly clas- 
sified), and estimates of the decision efficiency index e under 
linear and opposite utilities. A basic feature of the estimation 
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process is the use of Lord's Method 20 (Lord, 1969) as implemented 
by Wingersky, Lees, Lennon, and Lord (1969). Its use is recommended 
for data from at least 1000 examinees. 

In small-scale testing programs such as those associated with 
field testing for mastery tests or those conducted at the school- 
district level, the requirement of 1000 examinee cannot be easily 
fulfilled. In addition, the data presented in Wilcox (1977) seem 
to indicate that as far as error rates (and therefore efficiency 
under constant losses) are concerned, the use of the more complex 
compound binomial model instead of the simple beta-binomial model 
does not improve substantially the accuracy of the estimates. 

This section will compare estimates of ^2 based on the beta- 
binomial model with those computed from the compound binomial model 
as implemented by Livingston and Wingersky (1979) . (These authors 
proposed the use of the index e which is Iz^-l*) For the case of 
cor ^t ant and equal losses, the estimate ft z^ * s si m ply the sum 
of the two probabilities of making a correct decision. Hence, in 
using the output described by Livingston and Wingersky, the 
compound binomial estimate for z^ may be obtained by summing the 
probabilities which appear in the two cells "Should Pass/Will Pass 11 
and "Should Fail/Will Fail. 11 For the first output reported in 
Figure 1 of the Livingston-Wingersky paper, this estimate is 
55.9% + 24.3% « 80.2% or .802. The output also reports the com- 
pound binomial estimate for the efficiency index e under linear and 
opposite utilities. The (raw) efficiency index z^> ^ n turn > may be 
deduced from e via the formula z^ " (l+e)/2. For the output just 
referenced, the value of e is 0.81, hence the estimate for z^ is 
(1+0.8D/2 = .905. 

The compound binomial estimates for efficiency index z^ under 
constant and linear losses with Q * 1 (or under constant and linear, 
but opposite utilities) were derived from the computer programs 
provided by Livingston and Wingersky. The corresponding estimates 
basec on the beta-binomial model were obtained via the computer 
program listed in Appendix A. The comparison of the two sets of 
estimates was made using the basic test data summarized in Table 1* 
These data were extracted from the Comprehensive Tests of Basic 
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Skills data file collected In the 1978 South Carolina statewide 

2 

testing program. In this table, s d represents the variance of the 
item difficulty (defined as the proportion of examinees who cor- 
rectly answered the item). 

TABLE 1 

Description of Test Data Used to Compare the Beta-Binomial 
and Compound Binomial Estimates of e 0 



Case 


n 


Mean 


S.D. 


8 dlff (Xl ° 4 > 


a 


8 


A 

a 21 


A 


10 


7.2315 


2.6888 


64.87 


1.7693 


0.6774 


.8034 


B 


15 


8.6247 


3.1932 


301.61 


3.9433 


2.9148 


.6862 


C 


20 


16.1621 


3.8987 


97.93 


3.1278 


0.7427 


.8379 


D 


30 


18.0707 


6.3192 


202.90 


3.2300 


2.1323 


.8484 


E 


40 


23.5658 


8.3406 


281.87 


3.1258 


2.1799 


.8829 


F 


50 


30.4848 


10.7558 


205.92 


2.8152 


1.8022 


.9155 



Table 2 reports the estimates of e 2 for a variety of combina- 
tions of 6 q and c. The data reveal only negligible discrepancies 
between the beta-binomial estimates and those based on the compound 
binomial model. Since the beta-binomial estimates only require 
estimation of the two parameters of the beta distribution, they may 
be safely obtained from the responses of a small or moderate sample 
of examinees. For a sample of this type, estimation via the com- 
pound binomial model may not be appropriate. 

TABLE 2 

Estimates of e 2 Based on the Beta-Binomial (BB) 
and Compound Binomial (CB) Models 



Opposite & Constant Opposite & Linear 





o 




BB 


CB 


BB 


CB 


A 


.70 


7 


.874 


.893 


.948 


.950 


B 


.70 


10 


.792 


.798 


.898 


.905 


C 


.70 


14 


.912 


.923 


.972 


.975 


D 


.80 


24 


.901 


.906 


.977 


.980 


E 


.80 


32 


.920 


.917 


.985 


.985 


F 


.80 


40 


.925 


.934 


.987 


.990 
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10. COMPUTER PROGRAM 

A FORTRAN IV program which provides an analysis of decision 
efficiency for the case of constant and linear losses is listed in 
Appendix A- For each problem, the input data are to be "keypunched" 
on three cards detailed as follows. 

First Card 

This card contains the title of the problem, keypunched between 
columns 1 and 80. 

Second Card 

This card provides data on number of items (n) , the alpha (a) 

and beta (B) parameters of the true ability distribution, the true 

mastery score (6 ), the test passing score (c) , and the loss ratio 
o 

(Q). These must be keypunched according to the format (15, 9r1 0.5, 

F5.3, 15, F5.2). 

For example, the efficiency analysis described in numerical 

examples 1 and 2 may be performed via the computer program using 

the following two input cards. 

1 1 2 2 3 3 4 

Column: 1...5... 0 5 0 5 0 5 0 

First card: AN EVM^LE OF DECISION EFFICIENCY ANALYSIS 
Second card: 10 10.5 4.5 .60 8 .50 

Table 3 lists the output for t! \s problem. 

Several problems may be performed in one run by stacking the 

input cards together. 

11. SUMMARY 

This paper describes two indices which pertain to the effi- 
ciency of decisions in mastery testing. The indices are gener- 
alizations of the raw agreement index and the kappa index. Both 
express the reduction in proportion of losses (or the gain in pro- 
portion of utility) resulting from the use of test scores to make 
decisions. Empirical data reveal only negligible discrepancies 
between the beta-binomial and compound binomial estimates Tor these 
indices. 
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TABLE 3 



An Output of the Computer Program 

ANALYSIS OF DECISION EFFICIENCY BASED ON THE 
BETA-BINOMIAL MODEL. THE TITLE OF THIS PROBLEM IS 
AN EXAMPLE OF DECISION EFFICIENCY ANALYSIS 
INPUT DATA ARE: 



NUMBER OF ITEMS 10 

ALPHA 10.50000 

BETA 4.50000 

THETA ZERO 0.60000 

TEST PASSING SCORE .. 8 

LOSS RATIO Q 0.50000 



FOUR-CELL TABLE WITH PROBABILITIES 

SHOULD FAIL AND WILL FAIL 0.1758 

SHOULD PASS AND WILL PASS 0.4113 

SHOULD FAIL BUT WILL PASS 

(A FALSE POSITIVE ERROR) 0.0173 

SHOULD PASS BUT WILL FAIL 

(A FALSE NEGATIVE ERROR) 0.3955 

FOR LINEAR LOSSES, THE OUTPUT ARE: 



RISK FOR USING TEST SCORES .. 0.02165 

RANDOM- DECISION RISK 0.03865 

MAXIMUM RISK 0.07118 



DECISION-EFFICIENCY INDICES: 



CORRECTED-FOR-CHANCE INDEX ... El = 0.440 
NO CORRECTION FOR CHANCE 

(RAW) INDEX £2 - 0.696 

** NORMAL END OF PROGRAM ** 
PROGRAM WRITTEN BY 
HUYNH HUYNH 
COLLEGE OF EDUCATION 
UNIVERSITY OF SOUTH CAROLINA 
COLUMBIA, SOUTH CAROLINA 29208 
MAY 1980 
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APPENDIX A 

A Computer Program for the Analysis of the Efficiency 
of Decisions in Mastery Testing 
Based on the Beta-Binomial Model 

Disclaimer : The computer program hereafter listed has been written 
with care and tested extensively under a variety of conditions using 
tests with 50 or fewer items. The author, however, makes no war- 
ranty as to its accuracy and functioning, nor shall the fact of its 
distribution imply such warranty. 
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C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 
C 

c 



A COMPUTER PROGRAM FOR I HE COMPUTATION OF DECISION-EFFFICIENCY 
^™ CONSTANT OR LINEAR LOSSES AND WITH BETA- BINOMIAL TEST DATA. 
CONSTANT LOSSES INCLUDE CONSTANT UTILITIES t AND LINEAR LOSSES 
INCLUDE LINEAR AND OPPOSITE UTILITIES. 

INPUT DATA ARE: 

FIRST CARD; TITLE OF THE PROBLEM (ENTER ANYTHING YOU WANT) 

SECOND CARD: ENTER THE FOLLOWING INFORMATION, USING THE FORMAT 
(15 » 2F10 • 5 , F5 • 2 , 15 , F5 . 2) 



N 
A . 
B . 
TT 
IM 

Q . 



NUMBER OF TEST ITEMS 
ALPHA PARAMETER OF THE BETA DISTRIBUTION 
BETA DISTRIBUTION OF THE BETA DISTRIBUTION 
THE*** ZERO (MINIMUM TRUE SCORE FOR PASSING) 
TEST PASSING SCORE 
LOSS RATIO 



SEVERAL PROBLEMS MAY BE RUN CONSECUTIVEL v BY STACKING THE INPUT 
CARDS TOGETHER. 



SUBROUTINE REQUIRED: 
PACKAGE. 



THE BDTR OF THE SCIENTIFIC SUBROUTINE 



DOUBLE PRECISION A,B t TT t FP,FN,FPl t FNl,SUM 
DIMENSION W(20) 
I READ(5,100,ENI>99) W 
LOO FORMAT (20A4) 
WRITE(6.200) W 

200 FORMAT ( 1 1 , 'ANALYSIS OF DECISION EFFICIENCY BASED ON THE 1 / 

*T2 , BETA- BINOMIAL MODEL. THE TITLE OF THIS PROBLEM IS:7T2.20A4) 
READ(5,110) N,A,B,TT,IM,Q 
110 FORMAT(I5,2F10.5,F5.2,I5,F5.2) 

WRITE(6,230) N,A,B,TT,IM,Q 
230 FORMAT (T2 , 1 INPUT DATA ARE:'// 

* T6 /NUMBER OF ITEMS \I10/ 

* T6 1 'AL* HA '.F10.5/ 

* T6, BETA \F10.5/ 

* T6, THETA ZERO \F10.5/ 

* T6/TEST PASS TNG SCORE. . . 1 ,110/ 

* T6, 'LOSS RATIO Q \F10.5//) 

CALL ERRFPN(N,A,B,TT,IM t FP,FN) 

CALL ERRFPN(N,A+l.DO t B l TT, IM,FPl t FNl) 
CALL MDBETA(TT,A,B t Pl ,IER) 
CALL MDBETA(TT,A+l.D0,B t P2,IER) 
ZZ-A/ (A+B) 

R-Q*(ZZ*FN1-TT*FN)+TT*FP-ZZ*FP1 

AA-Q*(ZZ*(1.-P2)-TT*(1.-P1)) 

BB-TT*P1-ZZ*P2 

RM-AA+BB 

CALL NEHY3(N,A,B,IM,SUM) 
RE-AA*SUM+BB* ( 1 . - SUM) 
El-l.-R/RE 
E2-1.-R/RM 
Pl-SUM-FN 
P2-1.-SUM - FP 
WRITE(6,236) P1,P2,FP,FN 
236 FORMAT (T2 9 9 FOUR- CELL TABLE WITH PROBABILITIES'// 

* T6, 'SHOILD FAIL AND WILL FAIL \F10.4/ 

* T6. 'SHOUID PASS AND WILL PASS . ' ,F10.4/ 

* T6, 'SHOULD FAIL BUT WILL PASS '/ 

* T6.'(A FALSE POSITIVE ERROR) \F10.4/ 

* T6 f ' SHOULD PASS BUT WILL FAIL'/ 

* T6,'(A FALSE NEGATIVE ERROR) \F10.4// 

* T2, 'FOR LINEAR LOSSES, THE 01 -PUT ARE: '//) 



10 
20 
30 
40 
50 
60 
70 
80 
90 
100 
110 
120 
130 
140 
150 
160 
170 
180 
190 
200 
210 
220 
230 
240 
250 
260 
270 
280 
290 
300 
310 
320 
330 
340 
350 
360 
370 
380 
390 
400 
410 
420 
430 
440 
450 
460 
470 
480 
490 
500 
510 
520 
530 
540 
550 
560 
570 
580 
590 
600 
610 
620 
630 
640 
650 
660 
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WRITE(6,240) R,RE,RM,E1,E2 670 

240 FORMAT (T6, 'RISK FOR USING TEST SCORES. ..' ( F10. 5/ 680 

* T6, 'RANDOM-DECISION RISK , ,F10.5/ 690 

* T6, 'MAXIMUM RISK I ,F10.5// 700 

* T2, 'DECISION- EFFICIENCY INDICES: V/ 710 

* T6 , ' CORRECTED- FOR- CHANCE INDEX ... El - ' ,F6.3/ 720 

* T6,'NO CORRECTION FOR CHANCE 1 / 730 

* T6, ' (RAW) INDEX E2 - \F6.3) 740 

GOTO 1 750 

99 WRITE(6,150) 760 

150 FORMAT (T2 P '** NORMAL END OF PROGRAM **•/ 770 

* T2,' PROGRAM WRITTEN BY 1 / 780 

* T2,' HUYNH HUYNH V 790 

* T2,' COLLEGE OF EDUCATION'/ 800 

* T2 ( ' UNIVERSITY OF SOUTH CAROLINA'/ 810 

* T2,' COLUMBIA. SOUTH CAROLINA 29208'/ 820 

* T2, 1 MAY 1980 ') 830 
STOP 840 
END 850 
SUBROUTINE ERRFPN(N,A,B,TT,IM,FP,FN) 860 
DOUBLE PRECISION A, B f TZ f BETA, DFCT f U f V,DX f ONE,Yl» 870 

*VMONE, BB,DF(61) ,FP,FN, 880 

*E(2),TT.Pl f BA > BI 890 

EXTERNAL BETA.BI.DFCT 900 

C 910 

ONE- 1. DO 920 

Y1-BETA(A,B) 930 

C SET UP FOR FALSE POSTITIVE ERRORS 940 

TZ-TT 950 

IC-IM 960 

U-A+DFLOAT(IC) 970 

V-B+DFLGA'iCN-IC) 980 

DO 40 L-1,2 990 

C 1000 

F-ONE-TZ 1010 

DX-DFCTCU.V.TZ) 1020 

BB-BI(N,IC) 1030 

E(L)«DX*BB 1040 

C 1050 

BA-BETA(U,V) 1060 

C 1070 

IF(IC.EQ.N) GO TO 30 1080 

C 1090 

10 IZ-U-IC 1100 

DO 15 1-1, IZ 1110 

IX-IC+I 1120 

VMONE-V-ONE 1130 

Zl— (TZ**U) *F**VH0NE 1140 

C 1150 

DX- ( Z 1+U*DX) / VMONE 1160 

C 1170 

BB-Bb*(N-IX+l)/U 1180 

C 1190 

V-V-OWE 1200 

BA-BA*U/V 1210 

Z 1220 

U-U+ONE 1230 

C 1240 

E(L)-E(L)+BB*DX 1250 

15 CONTINUE 1260 

30 IF(L.EQ.l) GOTO 35 1270 

C 1280 

C INTERCHANGE DFPA AND DFPB FOR FALSE NEGATIVE ERROR 1290 

C 1300 

35 E(L)-E(L)/Y1 1310 

C SET UP FOR FALSE NEGATIVE ERRORS 1320 
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TZ-ONE-TT 1330 

IC-N-IM+1 1340 

U-B+DFLOAT(IC) 1350 

V-A+DFLGAT(N-IC) 1360 

c 1370 

40 CONTINUE 1380 

c 1390 

FP-E(l) 1400 

FN-E(2) 1410 

c 1420 

RETURN 1430 

END 1440 

c 1450 

DOUBLE PRECISION FUNCTION BI(N,M) 1460 

BI-1 1470 

IF(H*(N-M).EQ.O) GOTO 20 1480 

MM"N 1490 

IF(N.GT.(N-M)) M*HN-M 1500 

DO 15 J-1,MM 1510 

15 BI-BI*(N-J+1)/J 1520 

20 RETURN 1530 

END 1540 

c 1550 

SUBROUTINE NEHY3(N t A,B, IM,SUH) 1560 

DOUBLE PRECISION A,B,F,Z1,Z2,SUM 1570 

Zl-DFLOAT(N)+B 1580 

Z2-Z1+A 1590 

K-0 1600 

F-1.D0 1610 

DO 5 I-l.N 1620 

5 F-F*(Z1-DFLGAT(I))/(Z2-DFL0AT(I)) 1630 

SUM-F 1640 

10 "Pl-K+1 1650 

-F(KPl.GE.IM) RETURN 1660 

F-F*DFLGAT(N-K) *(Af DFLOAT(K) ) / 1670 

* (DFLOAT(KPl) *(Z1-DFLGAT(KP1) ) ) 1680 

SIAMUM+F 1690 

K-K+l 1700 

GOTO 10 "1710 

END 1720 

C 1730 

DOUBLE PSKCISION FUNCTION DFCT(A.B.TZ) 1740 

EXTERNAL BETA 1750 

DOUBLE PRECISION A,B,TZ,BETA 1760 

c 1770 

CALL MDBETA(TZ,A,B,P,IER) 1780 

C 1790 

IF(IER.NE.O) WRITE(6,100)A,B,TZ t IER 1800 

IOC FORHAT('0V ERROR IN BDTR, A B I* IER ARE \3F20. 10.15) 1810 

DFCT-DBLE ( P ) *BETA ( A , B) 1820 

RETURN 1830 

END % 18^0 

DOUBLE PRECISION FUNCTION BETA(X.Y) ISjU 

DOUBLE PRECISION A,B,CON,X, Y,F I860 

F-5.D0 1870 

A-X 1880 

B-Y 1890 

CON- 1. DO 1900 

IF(A.LE.F) GOTO 2 1910 

1 A-A-l.DO 1920 
CON-CON*A/(A+B) 1930 
IF(A.LE.F) GOTO 2 1940 
GOTO 1 1950 

2 IF(B.LE.F) GOTO 4 I960 
J B-B-1.D0 1970 

CON-CON*B/(A+B) 1980 
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IF(B.LE.F) GOTO 4 1990 

GOTO 3 2000 

4 BETA-DGAMMA(A)*DGAMMA(B)/DGAI1IIA(A+B)*C0N 2010 

RETURN 2020 

Kro 2030 

2040 

SUBROUTINE MDBETA(X.A,B,P.IER) 2050 

DOUBLE PRECISION A.B.X.BETA 2060 

EXTERNAL BETA 2070 

IF(A.GT..5 .AND. B.GT..5) GOTO 10 2080 

IF(A.GT..5 .AND. B.LT..5) GOTO 20 2090 

IF(A.LT..5 .AND. B.GT..5) COTO 30 £100 

OTHERWISE BOTH A AN£ B ARE SMALLER THAN .5 2110 

AA-A+1. 2120 

BB-B+1. 2130 

XX-X 2140 

CALL BDTR(XX.AA,BB,P,D.IER) 2150 

P-X**A*(1.D0-X)**B/(A*BETA(A.B))+X**B*(1.D0-X)**(A+1.D0)/ 2160 

* (B*BETA(A+1.D0.B)) + P 2170 

RETURN 2180 

10 AA-A 2190 

BB-B 2200 

XX-X 2210 

CALL BDTR(XX,AA,BB,P,D,IER) 2220 

RETURN 2230 

20 AA-A 2240 

BB-B+1. 225C 

XX-X 2260 

CALL BDTR(XX,AA.BB,P.D,IER) 2270 

P-X**B*(1.D0-X)**A/(B*BETA(A.B))+ P 2280 

RETURN 2290 

30 AA-A+1. 2300 

BB-B 2310 

XX-X 2320 

CALL BDTR(XX.AA.BB,P,D.IER) 2330 

p»X**A*(l.D0-X)**B/(A*BETA(A,B)) + P 2340 

RETURN 2350 

END 2360 
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ASSESSING TEST SENSITIVITY IN MASTERY TESTING 

Huynh Huynh 

University of Sonth Carolina 

A preliminary version of this paper was presented as part of the sym- 
posium "Approaches to test design for the assessment of the effect- 
iveness of educational programs" sponsored by the American Educational 
Research Association at its annual meeting in Boston, April 7-11, 1980. 

ABSTRACT 

This paper addresses the concept of test sensitivity "within the 
context of master testing. It is argued that correlation-bpsed 
indie >j nay not be appropriate for che assessment of test sensitiv- 
ity. Global assessment of test sensitivity may be carried out via 
'ndices such as p-max or 6-max. Local measures of sensitivity may 
be described via a two-parameter logistic model. Procedures are 
described to check the tenability of test sensitivity on the basis 
of observed test data. 



1. INTRODUCTION 

educational tests which are used for student or program evalua- 
tion are often described using terms such as "criterion-referenced," 
"domain-referenced," or "mastery" tests (Harris, Alkin, and Popham, 
1974; Berk, 1980). It is important to note, however, that these 
different labels often refer to different aspects of the same proc- 
ess; depending on the context, all might be used to describe the 
same test. For example, test items can be deliberately constructed 
(or selected from an item bank) to reflect specific educational 

This paper has been distributed separately as R'.i 80-7, August, 1980. 
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object ves; the resulting test scores are referenced to these 
objectives for interpretation and may then be used to assess the 
competency or mastery status of the individual student with respect 
to each of the objectives. For reasons of specificity, the term 
mastery testing will be used in this paper. By uastery testirg, it 
is meant that, at the end of the testing proces. test scores are 
used to make decisions regarding the individual student. In most 
testing for instructional purposes (such as formative testing or 
basic skills assessment programs) and for certification (in the 
professions or in minimum competency testing programs), there are 
two decision categories based on test scores, namely mastery and 
nonmastery. Students with high test scores are granted mastery 
status (in the domain of performances or educational objectives 
underlying the test) and perhaps are permitted to move to a more 
advanced or complex instructional unit. Other students with low 
scores will be placed in the nonmastery category and will perhaps 
be provided with the opportunity of remedial instruction. 

In the light of the above discussion, it appears clear that a 
mastery test is most useful if it can differentiate students who 
have mastered the educational objectives from those who have not. 
The extent to which the test fulfills this specific requirement 
will be referred to as instructional sensitivity (Harris, 1977; 
Haladyna and Goid, 1980). Of course, the concept of test sensitiv- 
ity cannot be detached from the unique purposes and/or circumstances 
for which the test scores are to be used. 

Another situation in which the concept of test sensitivity is 
called upon involves the use of test sco^« for admission or place- 
ment purposes. Here, decisions are made on whether or not the test 
scores show sufficient evidence that the student or applicant has 
the prerequisite skills or knowledge for a successful performance 
in the training or instructional program. For example, admission 
to a statistics course may require a minimal level of performance 
in arithmetic; hence arithmetic test scores may be used as a cri- 
terion fur admission to such l course. In this case, test sensi- 
tivity may be framed within the context of predictive validity; a 
test may be said to be sensitive to the content of a course to the 
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exteut that test scores can separate those who, given effective 
instruction, will succeed in the course from the others who will not. 

The purpose of this paper is to address the concept of test 
sensitivity within the context of mastery testing (Huynh, 1976), 
and to propose new ways to assess the degree to which a test is 
sensitive to the particular purpose for which it is intended. 

2. POSSIBLE MISUSE OF CORRELATION 
TO ASSESS TEST SENSITIVITY 

A variety of designs has been proposed to assess test sensi- 
tivity. Most involve the use of two contrasting groups of test 
scores. For example, a pretest-posttest design may be in order if 
there are reasons to assume that instruction is effective. In 
other words, a mastery test is given prior to instruction and 
again at the completion of instruction,, The mastery test is 
sensitive to the instructional objectives to the extent that the 
distribution of pretest scores and that of posttest scores can be 
separated from each other. Another contrasting groups design 
involves the use of an unins^ructed group and an instructed group. 
This design is appropriate for a test to be us?d to admit students 
to a course; in this case the instructed group would consist of 
students who have successfully completed the course and the un- 
instructed group would be formed of students who have failed the 
course. 

How should test sensitivity be assessed on the basis of the 
separation between the test score distribution*? of the two contrast- 
ing groups? Is the point biserial correlation an appropriate 
choice for test sensitivity? (The reader may note that this corre- 
lation may be obtained by assigning the dummy code X * 0 to the 
lower score group and X - 1 to the higher score group and then by 
computing the Pearson correlation between X and the test scores.) 
Correlation, typically, is influenced by the variability in the 
test scores, yet test score variation usually does not play a 
major role in mastery testing (Millman and Popham, 1974). To 
substantiate this point, let a mastery test be such that all pre- 
test scores are below the score of 20 and all posttest scores are 
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above this score 20. Then, for classification purposes, a passing 
score of 20 would be selected. It should take no imagination to 
see that the test is completely sensitive (i.e., completely sepa- 
rates the pretest score distribution from the posttest score 
distribution). Yet, the point biserial correlation between the 
dummy code X and the test scores will change ac. >rding to the means 
and standard deviations of the pretest and posttest scores. Follow- 
ing are two examples based on contrasting groups of ten subjects each. 

Pretest Posttest Point 

Mean S .D. Mean S.D. Biserial 

14.10 2.21 23.00 2.68 .88 

10.40 5.52 31.00 12.05 .74 

3. A SIMPLE ALTERNATIVE TO POINT gl SERIAL CORRELATION 

The above numerical illustration clearly indicates that the use 
of point biseria] correlation (or of similar indices) may not be 
appropriate if the distribution of the pretest scores or that of 
the posttest scores shows a large degree of variability. Unfor- 
tunately, it is a common experience that the pretest scores tend to 
show substantial variation. This is probably true for the case 
involving an uninstructed group, as well. (This occurs mainly 
because of random guessing and differences in input student 
characteristics . ) 

Thus, alternatives to point biserial correlation may be needed 
to assess test sensitivity in the use of test scores to make educa- 
tional decisions. There are a variety of ways to approach the 
issue. For example, something like the maximum raw agreement index 
(p-max) may be appropriate. This index is very simple to concep- 
tualize and to compute. At each possible cutoff score, compute the 
raw agreement index p between the grouping categories (pretest 
versus posttest, uninstructed versus instructed) and the decisions 
based on the test data (nonmastery versus mastery). Then search 
for the maximum of these raw agreement indices. This maximum p 
value corresponds to the situation in which the test scores are put 
to the best use. For both data sets in the previous illusti ation, 
the maximum of p (or p-max) is exactly 1. 
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FIGURE I 



Configuration of Decisions Based 
on Contrasting-Group Data 



Test 
data 
Contrasting \. 
groups 


Nonmastery Mastery 


Marginal 
sum 


Posttest 
(instructed) 
i-1 

Pretest 
(uninstructed) 
i=0 


n io 


n ll 


n i 


n oo 


R oi 


n o 



cutoff 
score 

Figure I depicts the configuration of decisions based on 
contrasting-group data. Let the index i take the value 0 when the 
individual test score belongs to the pretest (or uninstructed) 
group, and the value 1 when the test score belongs to the posttest 
(or instructed) grot.,. On the other hand, let the index j be 0 
when the test score is smaller than the cutoff score c (nonmastery 
status), and 1 rhen the test score is at least c (mastery status). 
The number of cesc scores in the combined contrasting groups in 
each (i,j)-cell will be denoted as n In addition, let 



n 



0 

n = n 



ij 

n 00 n 01 be the number of pretest (uninstructed) scores an] 



1 " "10 + n ll be tne number of posttest (instructed) scores. For 
the pretest-posttest design with no dropouts (experimental mortali- 
ty), n Q = n^ For the most general situation, particularly when 
the instructed-uninstructed design is contemplated, n Q and n are 
not typically equal. 

With the notation as defined, the p index at each cutoff score 
is given as 



l ll 



l 00 



ru 
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and the p-max index is simply the maximum of p when the cutoff score 
varies in its range of possible scores. 

Numerical Illustration 1 

Let n Q0 » 5, n Q1 - 10, n lQ - 15, and n n - 20. Then n Q « 15 
and n^ * 35. Hence p * .452. 

Numerical Illustration 2 

Table 1 reports the frequency distributions of the pretest and 
the posttest scores of 50 students on a four-item test. The p in- 
dices are listed as follows. 



Cutoff score 


1 


2 


3 


4 


p-index 


.67 


.76 


.77 


.64 



From this list, it may be deduced that p-max is .77. 

TABLE 1 

Frequency Distributions of Pretest and Posttest Data 
for Fifty Students 

Test score Pretest frequency Posttest frequency 



0 20 3 

1 10 1 

2 8 7 

3 7 20 
A 5 19 



The p-max index does not tak directly into account changes 
within individual students from pretesting to posttesting. Other 
indices may be more appropriate, particularly for the pretest- 
posttest design. Harris (1977) f for example, argues thet in 
studies of item sensitivity , an appropriate index would involve the 
difference between the proportion of students who have learned the 
item and the proportion of those who have forgotten it. The first 
proportion is the probability of responding correctly on the post- 
test, given that the student responded incorrectly on the pretest. 
The second proportion represents the probability of responding 
incorrectly following instruction, given that the response prior to 
instruction was correct. This index was referred to as the Index 
of Dep-uuure from Symmetry (6). To use this index for the assess- 
ment of test sensitivity, a cutoff score c may be selected, and 
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students are then classified into the two categories of mastery and 
nonmastery, A 5 index may then be computed, considering nonmastery 
as an incorrect response and mastery as a correct one. Then, the 
maximum of 5, 5-max, may be determined by locating the maximum of 6 
when the cutoff score c varies within its range of possible values. 
For both sets of data considered in Section 2, the 6-max indices 
are exactly 1. 

Figure II depicts the configuration of decisions based on pre- 
test and posttest data. With c as a cutoff score, each student is 
classified twice, once based on pretest data and again based on 
posttest data. Let i = 0 (for nonmastery) and 1 (for mastery) be 
the decision based on pretest data, and j - 0 or 1 for the decision 
based on posttest data. In addition, let n^ be the number of 
students in each (i,j)-cell, n Q - n Q0 + n Q1 be the number of stu- 
dents who fail the pretest, and «= n 1Q + n^ be the number of 
students who pass the pretest. Then the index 6 is defined a 



6 = 



n 01 n 10 



"0 "1 

As previously stated, 6-max is the maximum value that 6 can take 
within the range of possible cutoff scores. 



(2) 



FIGURE II 

Configurr ion of Decisions Based 
on Pretest-Pcsttest Data 
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Numerical Illustration 3 

Table 2 reports the bivariate pretest-posttest frequency 
distribution of 50 students on a four-item test. At the cutoff 
score 3, the cell and pretest marginal frequencies are given as 

n oo * n oi * ^* n io m ^ f and n n * ^ ; n o = ^ and n i 53 

Hence the 6 index is 6 - .539. At all possible cutoff scores, the 
6 indices are listed as follows. 



Cutoff score 




2 


3 


4 


6 -index 


1 .833 


.867 


.539 


-.400 



From the list it may be deduced that 6-max is .867. 

TABLE 2 

Bivariate Frequency of Pret^st-Posttest Data 



Posttest score 







0 


1 


2 


3 


4 






4 


0 


0 


3 


1 


1 


5 


Pretest 


3 


0 


0 


0 


2 


5 


7 


score 


2 


0 


0 


0 


4 


4 


8 




1 


2 


0 


1 


3 


4 


10 




0 


1 


1 


3 


10 


5 


20 






3 


1 


7 


20 


19 


50 



4. AN OVERALL APPROACH TO TEST SENSITIVITY 

It may now be pointed out that point biserial correlation, 
p-max, 6-max, and other similar indices provide only a global (over- 
all) measuie of test sensitivity. They do not provide an assessment 
of the extent to which the test is sensitive at a particular ability 
or test score level or in a given range of ability. For example, 
it is welJ known that one test may provide a smaller error of mea- 
surement than another; however, its relative efficiency with respect 
to the other test varies as a function of examinee ability level 
(Lord, 1974). The same situation may appear in test sensitivity. 
It is conceivable that a test is able to separate two contrasting 
groups more effectively at one level of ability than at another. 

consider now the case of instructional sensitivity. If the 
test items faithfully reflect the objectives underlying the instruc- 
tional unit, then a posttest score (or the score of a student who 
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has completed the unit) is more likely to be high than a pretest 
score (or the score of a noninstructed student). Let the qualifier 
"success" be applied to any posttest score and "failure" to any 
pretest score. The following definitions apply to test sensitivity. 

Definition 1 

Let s(0) be the probability of success at the ability (or test 
score) level 0. A test is said to be sensitive to the instructional 
unit (or to the task for which the test is used as a predictor) in 
a range of ability if s(0) is nondecreasing (but not a constant 
uniformly) within this range. 

The function s(0) may take any shape, as long as it is non- 
decreasing. As defined, s(0) is reminiscent of the concept of item 
characteristic curve (Lord & Novick, 1968) and of the notion of 
referral success (Huynh, 1976). The second notion is more relevant 
to the psychometric foundation of mastery testing. 

Now, at the ability level 0, a test is more sensitive if the 
probability s(0) changes sharply at this point. The following 
definition applies to the case where s(0) has a derivative. 

Definition 2 

Let 5(0) denote the derivative of s(0) with respect to 0. This 
derivative is said to be the test sensitivity at the ability level 6. 

It follows from the second definition that test sensitivity is 
a non-negative function since s(0) is nondecreasing. It may be 
noted that 5(0) acts like the density of a cumulative distribution 
function; hence estimation procedures associated with density 
functions (Wegman, 1974) would be applicable to £(0). 

5. TEST SENSITIVITY AND ITEM INFORMATION 

Within the context of mastery testing (Huynh, 1976), a two- 
parameter logistic form has been proposed for s(0), namely 
e a(6H3) 

S(6) * i +e a(e - 6) 9 (3) 
where a > 0 and g are suitably chosen constants. The test sensi- 
tivity function is now given as 
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CO) = s'(6) 2§ = as(6)(l-s(e)) . (4) 

( 1+e «(e-3)) 2 

Let P(6) = s(6) and Q(6) = l-s(6). Then it may be verified that 

The quantity on the right of this expression represents the informa- 
tion provided by a test item for which the item characteristic curve 
is P(6) = s(e) (Birnbaum, 1968, p. 454). 

6. STATISTICAL INFERENCE REGARDING TEST SENSITIVITY 
AS A MONOTONE REGRESSION PROBLEM 

Consider now a range of ability (or test score) in which a 
test is suspected to be sensitive to a given instructional unit or 
to a task which it is intended to predict. An inferential proce- 
dure will now be presented for checking the hypothesis that s(8) is 
nondecreasing. 

Let the mentioned range of ability be partitioned into k 
mutually exclusive and exhaustive sets, namely A^A^ . . . ,A^ in such 
a way that the number of test scores in each of the k categories in 
the combined pretest-posttest or instructed-noninstructed sample 
are as nearly equal as possible. Let n^n^...,^ be the number of 
test scores which fall into each of the A sets, and let be the 
corresponding proportion of students belonging to the success 
category. 

Under the assumption that s(9) is nondecreasing, the sample 

proportions must be adjusted if necessary to reflect this pre- 

imposed trend. This may be dor^ via the Pool-Adjacent-Violator 

algorithm described in Barlow, Bartholomew, Bremner, and Brunk 

(1972). In essence, whenever two consecutive sample values s. and 
* i 

S i+1 are in the unex P ected direction (decreasing), they are taken 
as the weighted average (n^ + n 1+1 s 1+1 )/( n 1 + n 1+1 > • This common 
value will then be compared with s . If these two quantities are 
not in the expected direction, then the s^, 3 ±+i> and 3 ±+2 values 
will be taken as equal, and equal to their weighted average. 
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Once the set of monotone-adjusted has been obtained, the 
standard chi square test for association in a 2 * k contingency 
table may be applied- The null hypothesis (independence) corre- 
sponds to the case ^here s(6) is a constant for all the A cells; 
the alternative (dependence) expresses the nondecreasing nature of 
s(6) with respect to 6. The use of the standard chi square test in 
this case was suggested by Bartholomew (1959) and Shorack (1967) 
for the case where the n^ are equal. Presumably the test should 
hold approximately when the n^ are nearly equal. 

Numerical Illustration A 

Table 3 presents detailed computations for the chi squ^e test 
based on the data of Table 1. In this table, the A categories are 
taken as the test score levels of 0, 1, 2, 3, and A. As explained 
previously, at each score, n denotes the total number of cases, s. 
the unadjusted proportion of success, and s^^ the monotone-adjusted 
proportion of success. Thus, at the same test score, the monotone- 
adjusted number of cases is n^ for success and n^l-s ) for 
failure. The corresponding expected frequencies are n ± p and n^l-p) 
where p is the proportion of success in the combined sample of test 
scores. (In the case of pretest-post test, p * ^.) The value of x 2 
is now 

2 k (n s* - n p) 2 k (n (1-s*) - n.(l-p)) 2 
X = E — — + Z 1 1 1 



i=l n i p >1 n ± (l-p) 

k n (s -p) k n. (s-p) 
E -±-1 + i -1—1 

i-1 P i-1 1_ P 

k 



Z n(s*-p) 2 . ( 6 ) 



p(i-p) * x 'V°i 

With the data of Table 1, the n are equal to 23, 11, 15, 27, and 
2A at the rest scores of 0, 1, 2, 3, and A. The adjusted frequen- 
cies of success are 2.71, 1.29, 7.00, 20.00, and 19.00. In addi- 

2 

tion, p = .5. Hence x ■ 17.18. With a standard chi-square 

distribution of k-1 ■ 5 degrees of freedom, the upper tail proba- 

2 

bility at this observed x value is smaller than .01. Hence the 

hypothesis of test sensitivity is supported by the test data. 
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TABLE 3 

An Example of the Adjusted Chi Square Test 
for Test Sensitivity 



Ability/ 
Test score 

«V 


n l 


8 1 
(xlOO) 


* 
8 1 
(xlOO) 


Cell frequency 
Adj us ted. 

observed Expected 


Chl-square 
contribution 


0 


23 


13. OA 


11.76 


2.71 


11.50 


6.72 


1 


11 


9.09 


11.76 


1.29 


5.50 


3.22 


2 


15 


A6.67 


A6.67 


7.00 


7.50 


0.03 


3 


27 


7A.07 


74.07 


20.00 


13. AO 


3.13 


A 


2A 


79.17 


79.17 


19.00 


12.00 


A. 08 


Total 


100 






100 


100 


X 2 - 17.18 +t 



t * 
computed as n,s, 

ft 

df - 4; p < .01 

7. ESTIMATING TEST SENSITIVITY VIA 
THE TWO-PARAMETER LOGISTIC MODEL 

Let it be assumed now that the function s(9) can be satisfac- 
torily represented by the two-parameter logistic curve 

a(e-e) 

s(0) = 



1+e a(6- e) • 

and hence the test sensitivity curve will take the form 
= cts(0)(l-s(6)). 

There are at least two ways to estimate the two parameters a 
and 3, namely the minimum logit square method and the maximum like- 
lihood (ML) procedure. The first procedure is less alegant than 
the second one; however, the computations are much less demanding. 

To apply the minimum logit square technique, let p be the 

natural logarithm of che ratio s /(1-s.). (Preferably, the 1 og of 

~* 1 v 1 

the ratio s^/(l-s^) should be used. J Let 0^ denote the typical 

ability of the test score category A^. Then, at each i 

? ± - <*(S ± -&) > 

hence a and 3 may be estimated via standard linear regression tech- 
nique. They are given as 
NZ0 1 p 1 -(Z0 1 )(Zp i ) 



NZ0 1 2 - (Z0 1 ) 2 



(7) 



and 
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6 " Na ' W 

In these formulae, N is the number of cases in the combined sample. 
Strictly speaking, the procedure does not work if s ± = 0 or 1 for 
some score category, since would then be equal to -« or +«. To 
proceed with the estimation, however, it has been recommended 
(Berkson, 1953) that s i be set to a small constant when it is ex- 
actly zero, and a number near 1 when it is actually one. 

A more direct procedure to estimate a and 6 would be an appli- 
cation of the ML principle. To do this, let 0 denote a test score 
in the combined sample and u^ be 1 for the success category and 0 
for the failure category. Then, assuming local independence for 
the success/ failure classification, the likelihood function for the 
combined sample may be written as 
N u 1-u 

l = n (s(e )) 1 (i-s(e,)) 1 

i«l 1 1 

■ e «< e r B)u i 

= ill Iw»<V» ' 

Hence the log of the likelihood function will take the form 

N N a(8 -B) 

log L = S a(9 -B)u - S log(l+e 1 ) . (9) 
i=l i-1 

The partial derivatives of log L with respect to a and B are given as 

a(6 -B) 

31oe L N N <V e > e 

- (e 1 -6,„ 1 - * * mi (io, 

1+e 

and 

•» ~ &~**£ir*&- (ii> 

He 

By setting these two partial derivative, to zero, the values for a 
and 6 may be found. The process will lead to the following simpler 
equations: 
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N a(e i" 6) N 

G(a,6) = i!i 6 "( e i-g) " ill u * " 0 ' (12) 

1+e 

a(6 -6) 
N 6 e N 

i-1 tJ . a(e i" 6) i-1 1 1 
1+e 

Equations (12) and (IS) may be solved via iteration procedures such 
as the Newton-Raphson process. The process requires the following 
partial derivatives: 
N 

G' = Z (6 -6)s(G )(l-s(G )) , (14) 
i«=l 1 

N 

Gl ■ -a Z s(e,)(l-s(e.)) , (15) 

8 i-1 1 1 

N 

pj - z e 1 (e i -e)s(e 1 ) (l-sO^) , (16) 



i=l 



and 



F 6 



N 

-a z e 1 s(e 1 )(i-s(e 1 )) . (17) 



i=l 



Let a and 6 be two starting values for a and 6. Then the Newton- 
u o 

Raphson iterated values and 6^ satisfy the linear equations 
.' ( V 0t 0 )G ; (a o' 6 o ) + (6 r 6 0 )G 6 (a o' 6 o ) = " G <V*o> 

^vVn/vV + ^rVWV = - F <v* 0 > • (18) 

Hence ot^ and are given as 



and 



°1 = a o - ^V^VW " F(ot o' 6 o )G 6 (a o' 6 o ) ) /A 
h * * 0 + (G(« 0 .B 0 )F;(a o ,6 0 ) - F(a Q , ^(a^)) /A 



where A = G'(a ,6 )F»(a ,6 ) - F'(a ,6.)Gi(a ,8). 

a o o p o o a o o p o o 

The iteration process continues until convergence is assured 

to a satisfactory degree. 
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Numerical Illustration 5 

For the data of Table 1, the logit square procedure based on 

A A 

the unadjusted proportions s i yields the estimates a - .982 and 
3 = 2.397. The maximum likelihood procedure results in the esti- 
mates a « .947 and 6 » 2.244. 

Within the logistic model the traditional asymptotic likelihoo 
ratio test may be used to check the hypothesis of test sensitivity. 
The log likelihood associated with ML estimation for a and g is 
equal to log L(a,3), where log L is given in Equation (9). Whci 
the test shows no sensitivity, then the probability 8(6^ is uni- 
formly equal to p Q - n Q / (xiy+n^ . (This probability is equal to h 
for the pretest-posttest design.) The corresponding log likelihood 
is given as log L q = n Q log Pq + n x log (1-P Q ). me asymptotic 
likelihood ratio test is carried out via the quantity 
X * log L(a,g) - log which is distributed approximately as a 
chi square distribution with one degree of freedom. With the data 
referred to in Numerical Illustration 5, for example, it was found 

that log L(a,g) =* -51.718, and log L - -69.315. Hence X 2 - 17.597 

o A 

which corresponds to an upper tail probability of less than .01. 
Thus, the data show strong evidence of test sensitivity. 

Appendix A provides a listing of a computer program for the 
computations described in this section. 

8. SUMMARY 

This paper has discussed test sensitivity in mastery testing. 
Arguments have been presented to show that correlation-bated 
indices may not be appropriate for assessing the sensitivity of 
mastery tests. Instead, indices such as p-roax or 6-max are advo- 
cated for the global assessment of test sensitivity, while local 
measures of sensitivity may be obtained using a two-parameter 
logistic model. Finally, procedures are described to test the 
tenability of the hypothesis of test sensitivity on the basis of 
observed test data. 
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APPENDIX A 

Listing of a Computer Program for Assessing Test Sensitivity 
via the Two-Parameter Logistic Model 

Disclaimer : The computer program here. c ter listed has been written 
with care and tested extensively under a variety of conditions 
using test3 with 60 or fewer items. The author, however, makes no 
warranty as to its accuracy and functioning, nor shall the fact of 
its distribution imply such warranty. 
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C THIS PROGRAM COMPUTES THE MAXIMUM LIKELIHOOD ESTIMATES OF THE 10 

C ALPHA AND BETA PARAMETERS WHICH FORM THE BASIS FOR ASSESSING 20 

C TEST SENSITIVITY. 30 

C INPUT DATA ARE LISTED AS FOLLOWS. 40 

C FIRST CARD: TITLE CARD (ENTER ANYTHING YOU WANT.) 50 

C SECOND CARD: NUMBER (M) OF TEST SCORE/ABILITY LEVELS (15) 60 

C THIRD CARD: FORMAT CARD FOR EACH OF THE M FOLLOWING CARDS 70 
C M CARDS: EACH CONTAINS THE TEST SCORE LEVEL, THE FREQUENCY 80 

C OF THE PRE TEST/UN INSTRUCTED GROUP, AND THE 90 
C FREQUENCY OF THE POSTTEST/ INSTRUCTED GROUP. EACH 100 
C CARD IS TO BE KEYPUNCHED ACCORDING TO THE FORMAT 110 

C ENTERED VIA IKE THIRD CARD. 120 

C SEVERAL PROBLEMS MAY BE PERFORMED IN ONE RUN BY STACKING THE 130 

C INPUT CARDS TOGETHER. 140 

C THIS PROGRAM IS WRITTEN FOR TESTS WITH UP TO 61 LEVELS OF TEST 150 

C SCORE OR ABILITY. FOR LONGER TESTS , REDIMENS ION T AND N TO BE 160 

C T(M) AND N(M), M BEING THE NUMBER OF LEVELS. 170 

C 180 

DIMENSION T(61) ,N(61), FCT(20) 190 

DOUBLE PRECISION A,B,EA,EB,EPS , DELTA 200 

EPS-. 00001 210 

NTOT-0 220 

SU-0. 230 

STU-0. 240 

ST>0. 250 

ST2-0. 260 

SR-0. 270 

STR-0. 280 

5 READ(5,95,END-99) FCT 290 

95 FORMAT (20A4) 300 

WRITE(6,195) FCT 310 

195 FORMAT (T2 , 'ANALYSIS OF TEST SENSITIVITY VIA THE LOGISTIC MODEL'/ 320 

& T2, <irMrk*** *k*kAh*Akh * k),XLX&***k*kk*M * *t t Akhk*k*khkk*k** « / 330 

& T2 /TITLE OF THIS PROBLE1 IS: , /T2 i 20A4) 340 

READ(5,96) II 350 

'J6 F0R\AT(I5) 360 

WRITE(6,196) M 370 

196 FORMAT (T2 , 'NUMBER OF TEST SCORE/ABILITY LEVELS:' ,15) 380 
READ(5 P 97) FCT 390 

97 FORMAT (20A4) 400 

WRITE(6,197) FCT 410 

197 FORMAT (T2 P 'INPUT FORMAT FOR FREQUENCY DATA: '/T2 ,20A4) 420 
WRITE(6,198) 430 

198 FORMAT (T2, 'FREQUENCY DISTRIBUTION'/ 440 
& T2 , ' SCORE PRETEST/ UNINSTRUCTED POSTTEST/ INSTRUCTED ' 450 
& /T2,' LEVEL GROUP GROUP'/ 460 
& T2, ' ***** **************** ********* * ***** •) 47Q 

DO 20 K-i|m 48U 

READ (5 ,FCT) T(K) ,IILOWER,NUPPER 490 

WRITE(6,200) T(K),NLOWER,NUPPER 500 

20O rORMAT(T2,F8.2,T21,I3,T44,I3) 510 

N (K)-NLOWERfNUFPER 520 

UTOT-NTOT+N(K) 530 

R-FLOAT ( NUP PER ) / FLOAT ( N ( K) ) 540 

SU-SU+HUPPER 550 

STU-STU+T (K) *N UPPER 560 

R-AMAll(.01,R) 570 

R-AMIN1(.99,R) 580 

R-AL0G(R/(1.-R)) 590 

ST~ST+T(K) 600 

ST2-i>T2+T(K)**2 510 

SR-SR+R 620 

20 STR-STR+T(K)*R 630 

A-(M*STR- ST*SR)/ (M*ST2-ST*S?) 640 

B-(A*ST-SR)/ (M*A) 650 

WRIT£<6,215) A,B 660 

215 FORMAT (T2, 'STARTING VALUES BASED ON MINIMUM LOGIT'/ 670 

& T17, 'ALPHA - ' ,F10.5/T17, 'BETA - \F10.5) 680 

30 CALL NEWTON (M, II , T, SU , STU , A , B , EA , EB ) 690 

DEL T \-DMA ll(EA.EB) 700 

IF (DABS (DELTA) . LT. EPS) GOTO 40 710 

A-A+EA 720 

B-B+EB 730 
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\F10.5/ 860 
\F10.5/ 870 
".F10.5/ 880 



GOTO 30 740 

40 WRITE (6 ,220) A,B 750 

220 FORMAT (T2/FINA7. RESULTS: ALPHA - 1 ,F10.5/ 760 
* T2,' BETA - \F10.5//) 770 

H1-A*(STU-B*SU) 780 

P-SU/OTOV 790 

DO 50 I«.,K 800 

50 Hl-ia-K(I)*DLOG(l.+DEXP(A*(T(I)-B))) 810 

H0-SU*ALOGr?)+(in:OT-SU)*AlJ0G(l. -P) 820 

CHISQ-H1-H0 830 

WRITE(6,21) H1,H0,CHISQ 840 

221 F0RMAT(T2,'LOG OF THE LIKELIHOOD FUNCTION' / 850 
U T2,' WITH TEST SENSITIVITY: " 
U T2 f 1 HO TEST SENSITIVITY. . : 
& T2/CHI-S0UARE STATISTIC ...: ,.^., ( 

6c T2/WITH ONE DEGREE OF FREEDOM. ) 890 

GOTO 5 900 

99 WRITE(6,225) 910 

225 F0RMAT(T2,' **NORMAL END OF JOB** 1 / 920 

& T2,' PROGRAM WRITTEN BY HUYNH HUYNH'/ 930 

6c T2,' COLLEGE OF EDUCATION'/ 940 

6c T2,' UNIVERSITY OF SOUTH CAROLINA 9 / 950 

6c T2,' COLUMBIA, SOUTH CAROLINA 29208V 960 

6c T2,' JULY 1980') 970 

STOP 980 

END 990 

1000 

SUBROUTINE NEWTON (K,N,T,SU, STU,A,B,EA,EB) 1010 

DIMENSION N(1),T(1) 1020 

DOUBLE PRLCISION S,G,F,GA,GB,FA,FB,D,E,P,A,B,EA,EB 1030 

G--SU 1040 

F—STU 1050 

FA-O.DO \060 

FB-O.DO 1070 

GA-O.DO 1080 

GB-O.DO 1090 

1100 

DO 10 I-l.K 1110 

E-DEXP(A*(T(I)-B)) 1120 

P-E/(E+1.D0; 1130 

S-P*(1.D0-P) 1140 

G-C+P*N(I) 1150 

F«F+H(I)*T(I)*P 1160 

GA-GA+U(I)*(T(I)-B)*S 1170 

GB»GB-A*S*N ( T ) 1180 

FA-FA4I(I)*T(I)*(T(I)-B)*S U 90 

10 FB-FB-A*T(I)*S*N(I) 1200 

D-GA*FIi-FA*GB 1210 

EA— (G*FB-F*GB)/D 1220 

EB-(G*FA-F*GA)/D 1230 

RETURN 1240 

END 1250 
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SELECTING ITEMS AND SETTING PASSING SCORES FOR MASTERY TESTS 
BASED ON THE TWO-PARAMETER LOGISTIC MODEL 



Huynh Huynh 
University of South Carolina 

Presented at the Informal meting on Model -Based Psgchological 
Measurement sponsored by the Office of Naval Research, Iowa City, 
Iowa, August 17-22, 1986. 

ABSTRACT 

Three issues in mastery testing are considered, using a 
minimax decision framework, based on the two-parameter logistic 
model. The issues are: (1) setting passing scores, (2) assessing 
decision efficiency, and (3) selecting items to maximize decision 
efficiency. The losses or disutilities under consideration have a 
constant or normal ogive form. It is found that, in the context of 
minimax decisions, the item selection procedure based on maximum 
information may not provide the best decision efficiency. 



1 . INTRODUCTION 

A primary purpose of mastery testing is to classify each 
examinee in one of several achievement (or ability) categories. 
Typically there are two such categories, commonly labeled mastery 
and njnmastery. Let 6 be the ability or trait being measured. On 
the 6 scale, the status of mastery is defined by the condition 
6 > e Q , and that of nonmastery by 9 < 6^ where 8 q is a prespecified 
constant often referred to as a true mastery score. (As can be seen 
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later, the postulated existence of 9^ Is justified when the losses 
or utilities associated with the decision problem fulfill fairly 
reasonable assumptions.) In most practical situations, however, 8 
is not known, and tnastery/nonmastery decisions are usually based on 
the responses of the examinee to a relevant set of items. Three 
issues thus emerge, which deal with (1) scoring Item responses, 
(2) setting a test passing score, and (3) selecting test items which 
serve best (in some sense) the process of classification (mastery 
testing) . 

Within the context of Bayesian decision theory as applied to 
the case of constant losses, and considering tolerable limits on 
the probabilities of making false positive (a) and false negative 
(6) errors, Birnbaum (1968) and Lord (1980) have given considerable 
attention to the three issues mentioned above. The treatment devel- 
oped by Birnbaum does not seem to lead to an easy generalization to 
situations involving other than constant losses, and the discussion 
by Lord, at times, moves from Bayesian decision theory to confidence 
interval estimation without a strong link of continuity. 

The purpose of this paper is to provide a consideration of the 
aforementioned issues in mastery testing, using a minimax decision 
framework. Consideration is restricted to a two-parameter logistic 
model in which a sufficient statistic exists for the estimation of 
ability. A uinimax treatment of mastery testing which involves the 
simple binomial error model may be found in Huynh (1980), and in 
Wilcox (1976) in another form. 

2. SUFFICIENCY, MONOTONE LIKELIHOOD RATIO, 
AND MONOTONE DECISION PROBLEMS 

Consider a test consisting of n items (indexed by i ■ l,2,.-.,n) 

for which the item response u ± of an examinee with ability 0 follows 

a two-parameter logistic model with item difficulty b ± and item 

discrimination a^. It is well known that the composite test score 
n 

x ■ I a u is a sufficient statistic for estimating 6, and that 
1=1 1 1 

the conditional density f(x|e) has the monotone likelihood ratio 
property (Birnbaum, 1968, sec. 19.4). Sufficiency implies 
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(Ferguson, 1967, p. 120, Theorem 1) that any decision problem 
focusing on 6 may be simply based on the test score x since the set 
of decision rul^i based on x forms an essentially complete class. 
In other words, for any decision rule based on the vector of re- 
sponses (u r u 2 u n ), there is always a decision rule based on x 

which performs at least as well as the given rule in terms of risk 
(or expected loss). 

Consider now the action (a^ of granting mastery status and 
the action (a 2 > of denying mastery status to an axaminee with 
ability 6. Let ^(6) and L 2 (8) be the losses (disutilities) asso- 
ciated with the two actions a ± and a^ In practical situations, it 
seems reasonable to assume that ^(6) is nonincreasing in 6 and 
L 2 (6) is nondecreasing in 6. In other words, granting nastery 
status should cause less harm to an examinee with high ability than 
to someone with low ability. The reverse should hold for the act 
of denying mastery status. When the graphs of L (8) and L 2 (8) do 
not cross, either action a^ or action a 2 is uniformly better than 
the other at all ability levels 6; hence the choice for the best 
course of action would be either &1 or a 2 regardless of the observed 
test score x. This Vegener ate" case does not represent a typical 
use of test data; hence it seems reasonable to assume that the 
graphs of ^(6) and L 2 (6) cross at least at one point. Due to the 
nondecreasing nature of the difference L 2 (8) - L^e), crossing can 
occur only once. Hence, there exists one ability level 6 such 
that L x (e) > L 2 (6) for 6 < e Q and 1^(8) < L 2 (8) for 6 > 6°. Under 
these conditions, the decision problem is said to be monotone 
(Ferguson, 1967, chap. 6). It may then be noted that, in terms of 
loss, action Wj is best when 6 > 9 and action a is best when 

e < e . 

o 

Within the monotone decision problem as stated and with the 
monotone likelihood ratio Droperty for the density f(x|e), it is 
well known (Ferguson, 1967, p. 286; Zacks, 1971, ch. 9) that the 
search for an optimum decision rule may be restricted to the (essen- 
tially complete) class of decision rules defined by a x = {x; x > c} 
and a 2 » {x; x < c}, where c is a suitable test passing score. At 

•} ~, 
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each potential passing score c, the expected loss Is 

R(c;9) = L 1 (6)P(x>c|e) + L 2 (9)P(x<c|e) . (1) 

A minimax passing score c Is the score which minimizes the maximum 

o 

of R(c;9) with respect to 9. (For the sake of simplicity, it is 

assumed that the search for maximum and minimum can be accomplished.) 

Consider now the maximum G(9) of the two losses 1.^(6) and 

L 9 (9). It is given as G(9) = r (9) for 9 < 9 , and G(9) * L 0 (9) 

l o I 

for 9 ^> 9^. The expected loss R(c;9) may now be written as 

R(c;9) = G(9) + (L 2 (9) - ^ (9))p(x<c | 9) 

for 9 < 9 , and as 
o 

R(c;9) = G(6) + (^(9) - L 2 0))p(x>c| 9) 

for 9 > 9 q . The quantity C f (9) - L 2 (9) - 1^(9), 9 < 9 q , represents 
the opportunity loss due to a false negative error, and the quantity 
G s (9) = L 2 (9) - L^(9), 9 < 6 , denotes the opportunity loss due to 
a false positive error. Opportunity losses are zero when correct 
decisions, namely the two combinations (9<9 o , x <c) and (9_>9 q ,x>c) , 
are made. Thus, as indicated in this special case, solutions for a 
monotone decision problem may be found by looking at the original 
losses, or at the corresponding opportunity losses. Additional 
examples of this duality may be found in elementary textbooks such 
as Schlaifer (1969). 

Due to the duality as presented, both losses and opportunity 
losses will be considered in the remaining part of this paper. 
Thus, for opportunity losses C f (9) will be taken as zero when 
9 ^> 9 and C (9) as zero when 9 < 9 . In all otMr cases, both 

us O 

C f (9) and C fi (9) are nonnegative, with C f (9) being nonincreasing and 

C (9) nondecreasing in 9. 
s 

3. MINIMAX PASSING SCORE AND DECISION EFFICIENCY 
The risk R(c;8) may now be written as follows: 

c.(e>p(x>c|e) for e < e 



R(c;0) = 



c (6)p( x <c|e) for e > e 



(2) 



Now let 



L (c) » sup C f (6)P(x>c|9) (3) 

e<e r 



o 
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and 

L ? (c) * sup C (6)p(x<c|e) . (4) 

9>9 8 
— o 

Then the maximum (or supremum) of R(c;6) over 0 is 
M(c) * max{L 1 (c),L 0 (c)} . 

The optimum (minimax) passing score is the test score c at which 

o 

M(c) is minimized. The minimum (or infinimum) value of M(c), hence- 
forth denoted as R q , is traditionally referred to as the minimax 
value of the decision problem (Ferguson, 1967, p. 33). 

Consider now the extreme case where the score x does not 
reveal the true ability 9, e.g., when x and 6 are stochastically 
independent . Let 
* 

C f » sup C (0) 

0<0 * 



and 



* 

C = sup C (9) . 

S 9>9 8 
~* o 



In the case where both C f and C g are finite, the minimax passing 
score c satisfies the equation 

c f *G£c*) = C* P(x<c*) . 

In other words, when there is no relationship between x and 9, it 
is best to randomly assign mastery with a probability of 
s /tC s + C f' and n °nmastery with a probability of C f /(C g + C f ). 
The minimax value of the decision situation is then 

R*=C*C*/(C* + C*). (5) 

It may be recalled that opportunity losses are zero when the 
decisions are correct. Hence, when the test score x reveals fully 
the nature of the ability 9, the minimax value Is zero. This obser- 
vation along with the nature of R and R* suggests the use of the 

it it 

quantity n = (R -R Q )/R as an index to measure the efficiency of 
using test scores in making mastery/nonmastery decisions. This 
efficiency index measures the extent to which the best use of test 
data will reduce the amount of risk which would be expected had the 
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test data not been used at ail. It is a function of the opportunity 

losses C-(0) and C (0), and of the item parameters a, and b,. 

r s i i 

As defined, the efficiency index n is computable only when both 
* * 

C f and C s r * re flnlte - This means that the opportunity losses C g (e) 
and C^(6) are not allowed to drift out of bounds when 9 goes to 
infinity. Hence, efficiency is not defined for linear or quadratic 
losses if these are expressed as a direct function of 0. However, 
as Novick and Lindley (1978) point out, it seems sensible to demand 
that losses or utilities be bounded, at least in the context of 
educational and psychological testing. This assumption will be 
made throughout the remaining part of this paper. 

With the efficiency index now defined, the design of a mastery 
test may be accomplished by deciding on the number of test items, 
n, and selecting the test items such that ttie resulting efficiency 
index would be equal or nearly equal to a specified level. 

It seems intuitively true that as the number of test items 
increases, the efficiency index will increase. However, when the 
situation permits, a short test is preferable to a lengthy one. 
Hence, a balance seems appropriate between efficiency and test 
length. As a passing remark, one may express the latter trait as a 
function of n, say £(n), and then search for an n value at which 
the product of £(n) with the efficiency index n(n) is maximized. 

A. DESIGNING A MASTERY TEST FOR 
THE CASE OF CONSTANT LOSSES 

For technical reasons which should be ap; ent from the work of 
Birnbaum (1968, ch. 19), the case of constant losses in minimax 
decision problems may be represented by the following functions: 

1 if 0 < 8 

C f (0) (6) 
0 if 0 > 0 



and 



C 8 (8) = i 



o 



Q if 0 +e < 6 
o — 



(7) 



0 if 0 < 0 +e , 
^ o ' 
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R(c;8) = 



where Q is a constant. The region {8;8 <8<8 +e} i s an indifference 

o o 

zone. For any examinee whose true ability falls within this range, 
it does not matter whether action a ± or a 2 is taken. The constant 
Q is the ratio of the false negative error to false positive error. 
(It may also be said simply that the false negative error and the 
false positive error are weighted according to the ratio Q v 1.) 

The risk R(c;8) of Equation (2) may now be expressed as follows: 

fP(x>c|8) for 8<8 
i — o 

(8) 

[QP(x<c|e) for e>0 +e . 

— o 

As elaborated In Section 2, the conditional density f(x|e) belongs 
to the monotone likelihood ratio family. It follows from Dykstra, 
Hewett, and Thompson (1973) that x and 6 are stochastically Increas- 
ing In sequence; hence the maximum value of P(x<c|e) occurs 
at 6 = 6 o +e and that of P(x>c|e) occurs at 6 = 8 . Thus the expres- 
sions L^c), L 2 (c), and M(c) of Equations (3), (4), and (5) become 
L x (c) = P(x>_c|e - e Q ), ( 9) 

L 2 (c) = QP(x<c|e = e +e) , (10 ) 



and 



M(c) = max{L 1 (c),L 2 (c)} . 

It may >>e noted that, as a function of c, L^c) Is nonlncreas- 
lng and varies from 1 to 0. As for L 2 (c), It Is nondecreaslng and 
varies from 0 to Q. If the test score x can be -turned to be 
continuous, then the minimum of M(c) will occur at c where 

L l (c o> - L 2 (c o>- 

Consider now the special case where e - 0. Then the minimax 
passing score Cq satisfies the equation 

p(x>c |e = e ) = qp(x< c |e - e ) , 

u o o " 



o 

or 



p(x<c o |e = e Q ) - i/(q+D . 

The minimax value of the decision problem is R q » Q/(Qfl) regardless 
of the nature of the items which form the test. In addition, the 
minimax value encountered when the test data are not used is 
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R * Q/(Q+1); thus the decision efficiency index n is zero. (This 
conclusion is consistent with the observation by Wilcox (1977) that 
when n = 0, the process of randomly assigning an examinee to mastery 
and nonmastery status, each with a probability of .5, v.riJd encoun- 
ter no more maximum error than any attempt to use test data.) Thus, 
when there is no indifference zone separating meters and nonmasters 
on the ability scale, there is no way to design a test which will 
add any efficiency to the minimax decision-making process. For 
this reasor, the constant e shall be assumed to be strictly positive 
in th^ remaining part of this section. 

As may be seen Trom Equations (9) and (10), L^c) decreases 
from 1 to 0 and L 0 (c) increases from 0 to Q when the passing score 
c spans the range of possible values. If the test score can be 

assumed to be continuous, then the minimax passing score c is the 

o 

one at which L- (c) ■ L 0 (c) . Otherwise, c is one (or both) of the 
i z o 

two scores which lie nearest to the location at which tlie graphs of 
L^(c) and meet. As before, the minimax passing score is the 

test score at \,hich M(c) is the smallest. 

5. APPROXIMATE SOLUTIOiM FOR MINIMAX 
PASSING SCORE FOR CONSTANT LOSSES 

Let the test now consist of n items. Each item is associated 
with a characteristic function defined by tY . probability that the 
item response is correct, namely 

a 1 (6-b 1 ) 

V e) "~ a,<8-b.) - <"> 
1 + e 1 1 

n 

Let the (composite) test score be x ■ Z a 4 u 4* The mean and the 

i-1 1 1 

variance of the test score x are given respectively as 
n 

M<6) * 7 a p (8) (12) 
i-1 1 1 

and 

2 n 

" (0) = >J a p <G)q (6) , (13) 
i-1 1 1 1 

where q^e) - 1 - p^B). 
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When there are a sufficient number of items forming tb test, 
the conditional disLribution of x, given 6, may be approximated by 
the normal distribution with me* i y(6) and standard deviation a(6). 



The minimax passing score . now satisfies the equation 
p(x>^c o |e-e o ) » QP(x<c o |3«6 o +e) . 



(14) 



Let *(.) denote the cumulative distribution function of a unit 

normal variable (with zero mean and unit variance). Then c is the 

o 

solution of the equation 



1 - 



a(8 ) 
o 



c -p<6 +e) 
c o 



a(e +e) 
o 



(15) 



This equation may be solved numerically via the Newton-Raphson 
iteration process. To do this, let the function H be defined as 



H(c) = * 



c-i.<e ) 



o(8 ) 
o 



+ Q* 



c-w(e +e) 
o 



a(6 +e) 



- 1 . 



The derivative of H with respect to c is given as 



H*(c) = 





'c-u(e )' 


.<•„>♦ 


[ 0 < 9 o> 



WW* 

o 



c-y(6 +e) 
o 



o 



where <J>(.) is the density of the unit normal variable, 
words, 



*<z) - — e 



(16) 

(17) 
In other 

(18) 



To proceed with the Newton-Raphson process, a starting value 
c 1 for the passing score must be found. This may be taken as the 
average of the two c values at which 

(19) 



c-i <6 > 



and 



o<e o ) 



c-w(e +e) 
o 



1 

1+Q 



aO o + e ) 



1-K) ' 



(20) 



Once has been computed, the updated c 2 value is given as 



c 2 - c x - H( Cl )/H'( Cl ) . 
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Using as a starting value, the updated value may be found. 
The process will end when the change in the c value is sufficiently 
small. 

Numerical Illustration 

Let a test consist of ten items with parameters listed as 
follows : 



Item 


i 


2 3 4 


c 

-J 


6 


7 8 9 


10 


a i 


3.0 


1.0 1.0 0.6 


0.6 


0.3 


0.3 0.2 0.2 


0.1 




-2.0 


-2.0 -1.5 -1.0 


0.3 


0.6 


0.8 2.0 3.0 


5.0 


In addition 


> let 


e = 1.2, e = 1.0, 
0 


and 


Q - 2 


. Then y(6 ) = 
0 


6.2875 
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o(0 ) = .7795, u(e +e) - 6.5424, and a(6 +e) * .6943. The unit nor- 
00 o 

nial z score at which <f>(z) ■ 1/(1+Q) - 1/3 ts 2 = .432, hence the 

starting value for the Newton-Raphson process is c^ ■ 6.7333. The 

first updated value is c 0 - 6.1280. If a tolerance error of .00001 

is acceptable, then the iteration process ends at the solution 

c q ■ 6.x487. At this minimax passing score , the minimax value of 

the decioibn problem is R - M(c ) = P(x>c Ig ) « .5707. With 
* o o — o' o ^ 

R - Q/(1-MJ) - 2/3, tha efficiency index n is 1 - R /R = .1440. 

o 

6. AN ITEM SELECTION PROCEDURE FOR CONSTANT LOSSES 

Consider now the task of selecting n items for a test from a 

pool consisting of N items. (Conceptually, N may be infinite.) 

Which items should be selected? Lord (1980) proposes that items 

should be selected in such a way that the item responses would show 

the highest degree of information at G (for the case where e ■ 0) . 

o 

W lie it appears clear that there is a direct relationship between 

test information and the reduction of decision errors, it seems 

desirable to base the selection of test items on the efficiency 

index n> which is derived from (minimax) decision theory in a more 

direct way than is test information. 

Since the efficiency index is n = 1-R q /R and since R is 

constant, the highest efficiency would occrv when the minimax value 

R is at its minimum. When the test score can t;e assumed to be 
o 

contini-Dus, R is either P(x>c I G~G ) or QP(x<c i >9 +e). Thus, 
o — -o'o 00 

the selection of the items must be such that theae two quantities 
are simultaneously as small as possible. 
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Except for the case of equal Item difficulties and equal -Mem 

discriminations, the probabilities which define the minimax value 

R q involve the item parameters in a rather complex manner. Hence, 

the optimum selection of items would require the complete enumera- 
N 

tion of all the possible item combinations. The number of 
combinations may be ver^ large; thus, for large-item pools, optimal- 
ly in selection of items c^s not appear to justify the computing 
costs at the present time. 

An approximate solution for item selection may be obtained by 
noting that, at each passing score c, P(x>c o |0»0 o ) is an increasing 
function of each individual probability p,(e ), and that 
QP(x<c o |e»0 o +e) is an increasing function of each individual compo- 
nent Oq i (e o +e)=Q(l-p 1 (0 o +e)). Hence, at each c, the maximum value 
M(c) would be f iall if P ± (Q Q ) and QU-p^e +e)) are simultaneously 
small. (This cannot be true if e=0.) Hence, the selection ->f 
items B^y be accomplished as fellows. (i) For each item i, compute 
the maximum 6 ± of P 1 (6 q ) and QU-p^+e)) . (ii) Select the n items 
for which the 6^ values are the smallest. 

Numerical Illustration 

With the item parameters documented in the numerical illustra- 



found 


in Section 


5, the 6 ± 


values 


are 


given 


as 


follows : 


Item 


1 2 


3 4 


5 


6 


7 


8 


9 10 




1.00 .96 


.94 .79 


.63 . 


76 


.79 . 


98 


1.08 1.14 



Thus, if five items are to be selected for the decision situation 

under consideration, they would be the ones indexed by the numbers 

3, 4, 5, C, and 7. The efficiency index computed from the normal 

approximation is n = .1411. It may be interesting to note that the 

selection procedure based on maximum information (at 0 +|) would 

o 2 

result in the items with numbers 4, 5, 6, 7, and 8. The efficiency 
index for this selection is .1163. To gain some insight in the 
seleccion procedure based on 6, a randoa selection of items was 
conducted and resulted in the items 1, 3, 4, 8, and 10. The cor- 
responding efficiency index was found to be .1086. 

The numerical illustration seems to indicate that the procedure 
based on maximum item information may not be the best wa> to select 
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"est items in the context of minimax decision theory. In addition, 
though this procedure and the one based or minimum 6 value appear 
to select a fair number of common items, the 6 procedure seems to be 
more consistent with the minimax decision approach to mastery testing. 

7. A COMPUTER PROGRAM FOR THE CASE OF CONSTANT LOSSES 

Appendix A provides the listing of a FORTRAN computer program 
which is written for the analysis of decisions based on the minimax 
principle. Input data to the program are (i) a title card; (ii) a 

card providing the data for n, 0 , 0 + e, and Q, (iii) an input format 

o o 

card for reading each pair fa^,b^); and (iv) n cards of item 
parameters. For example, the input data for the numerical example 
of Section 5 is listed in Table 1. Table 2 lists the outpu^ of the 
program. 

TABLE 1 
An Example of Input Data 
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AN EXAMPLE OF MINIMAX DECISION ANALYSIS 

10 1.20000 2.20000 2.00000 .43200 

(2F10.5) 

3.0 -2.0 

1.0 -2.0 

1.0 -1.5 

0.6 -1.0 

0.6 0.3 

0.3 0.6 

0.3 0.8 

0.?. 2.0 

0.2 3.0 

0.1 5.0 



AN APPROXIMATE SOLUTION FOR MINIMAX 



PASSING SCORES UNDER NORMAI LOSSES 

Novlck and Llndley (1978) indicated that In most practical 
applications, a more realistic form of utility (and consequently, 
of the loss function) would be the normal ogive family. Let 
iKx) = e X /(l+e X ) be rhe logistic function. Then (Haley, 1952, p. 7) 
♦(1. 7z) and the unit normal distribution 4>(z) differ by less than 
.01 uniformly in z. For this reason, and for the computational 
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TABLE 2 

An Example of Output from the Computer Program 



MINIMAX DECISION ANALYSIS FOR THE TWO-PARAMETER 
LOGISTIC MODEL. TITLE OF THIS PROBLEM IS: 
AN EXAMPLE OF MINIMAX DECISION ANALYSIS 
NUMBER OF ITEMS 10 

INDIFFERENCE ZONE ON THE ABILITY THETA SCALE 
LOWER LIMIT (THETA-ZERO) . 1.20000 
UPPER LIMIT (THETA-ZERO 

PIUS EPSILON). 2.20000 



LOSS RATIO Q ... 
TOLERANCE ERROR 

ITEM PARAMETERS 



2.00000 
0.00001 



1 ID 


DISCR. 


DIFF. 


1 


3.000 


-2.000 


2 


1.000 


-2.000 


3 


1.000 


-1.500 


4 


0.600 


-1.000 


5 


0.600 


0.300 


6 


0.300 


0.600 


7 


0.300 


0.800 


8 


0.200 


2.000 


9 


0.200 


3.000 


10 


0.100 


5.000 



NORMAL APPROXIMATION FOR TEST SCORES 
AT LIMITS OF INDIFFERENCE ZONE 



LOWER LIMIT 



UPPER LIMIT 



MEAN 
S.D. 

ME\N 
S.D. 



..... 



6.288 
0.694 

6.542 
0,694 



MINIMAX VALUES 

WITH USE OF TEST SCORES 0.57067 

WITH NO USE OF TEST SCORES .. 0.66667 

FINAL RESULTS 

FINAL MINIMAX PASSING SCORE 6.14872 

D ECISION EFFICIENCY 0.14400 

simplicity associates with the logistic function, :he two functions 
'>(z) and ^(1.7z) will be used interchangeably in this section. 

The normal (or logistic) form for the two loss functions (dis- 
utilities) L x (6) for action ^ and L 2 (6) for action a 2 may be 
written as 
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and 



1^(6) = l/(l+e 1 ) 



o (9-P ) a 9 (9-6 9 ) 
L 9 (0) = Qe 1 1 /(l+e 1 ) 



(21) 



(22) 



In these expressions, and are positive constants* Constant 
losses correspond to the degenerate case in which ■ anc * 

a i = a 2 = °°* 

Now let 9 be the solution of L- (9 ) » L 0 (9 ). This quantity 

o 1 o i. o 

may be obtained via a typical Newton-Raphson iteration process* 
Given 0^, the opportunity losses are given as follows: 



C s (6) - i 



L-(6)-L. (6) for 6>6 



— o 



for 9<6 



(23) 



and 



0 
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C f (6) = 



for 6>6 
— o 



lL.'6)-L,(6) for 6<6 
I / o 



(24) 



At each potential passing score c, the risk R(c;9) of Equation (2) 
is equal to 

(l 1 (9)-L 0 (9*))p(x>c|9) for 9<9 
v 1 i — o 



R(c;9) = - 



(25) 



(l o (9)-L 1 (9))p(x<c|9) for 9>9 . 
w ^ l J — o 



Consider first the situation where e<9 . At 9 = 9 , 

o o 

(L 1 (9)-L 2 (9))p(x>_c|9) is zeio. As 9 approaches this (positive) 

quantity moves to 0. Hence there exists a value 9^ at which this 

function reaches a maximum* Let L^(c) be this maximum* Likewise, 

let L 2 (c) be the maximum of (l 2 (0)-L (0))p(x<c| 0) when 9 > 9 q . 

Then M(c) = max (L^(c), L 2 (c)} f and the minimax passing score is 

the test score c at which M(c) is the smallest, 
o 

Given c, both L^(c) and L 2 (c), and hence M(c) , may be obtained 
via numerical procedures such as the Newton-Raphson iteration proc- 
ess. The process is rather involved; however, it can be simplified 
by replacinc the two probabilities P(x>_c|8) and P(x<c|9) by two 
appropriate logistic functions. Let p(9) and o(9) be the mean and 
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standard deviation described in Section 5, Then, approximately, 
P(x<c|6) = e y /(l+e y ) 

and 

P(x>c|e) = l/(l+e y ) 
where y = 1. 7(c-y (0)) /a(0) . By using these logistic expressions, 
the two derivatives with respect to 0 ./hich form the basis for the 
Newton-Raphson process will involve only ratiDnal forms of the 
exponential functions, and thus can be obtained without undue 
difficulty. 

The location of the test score c at which the maximum risk 

o 

M(c) is minimized is somewhat tedious, since the algebraic form of 
M(c) a? a function of c is not known explicitly. Hence numerical 
procedures such as the Newtcn-Raphson iteration may no" be appli- 
cable. It may be noted, however, that the test score x varies from 

n 

0 to the maximum or x ffl = I a ± via only a finite number of points. 

1*1 

(When all item discriminations are equal, x can take only irt-1 
points; these may be taken conveniently as 0,1,2,. .. ,n.) The loca- 
tion of the minimax passing score c q may now be accomplished by 
competing the value of M(c) at several equally spaced points in the 
interval (0,x m ), and then by selectin* uhe point at which M(c) is 
the smallest. A refinement of this approach may be carried out by 
plotting M(c) against c, and then by drawing a smooth curve through 
the points (c,M(c)). The place at which the smooth curve is peaked 
may then be taken as the mintiaax passing score. 

9. ITEM SELECTION UNDER NORMAL LOSSES 

The item selection process described in Section 6 for the case 
of constant losses may be generalized to normal losses as follows: 

1. For each item, compute the maximum risk defined as 

S ± « max ^(0^(0) + L 2 (0)(l- Pl (0))} (26) 
0 

where 

P A (0) - exp (a 1 (6-b 1 ))/{l+exp [a ± (Q-b . 

2. Then select the n item*' which show the highest 6 values. 
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10. SUMMARY 

This paper provides a minimax decision framework in which 
three issues in mastery testing based on the two-parameter logistic 
model are approached. The issues deal with setting passing scores, 
assessing decision efficiency, ?.nd selecting it s to maximize 
decision efficiency. The losses or disutilities under consideration 
have constant or normal ogive form. It is found that, within the 
context of rainimax decisions, the item selection procedure based on 
maximum information may not provide the best decision efficiency. 
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APPENDIX A 

A Computer Program for Minimax Decision Analysis 
for the Two-Parameter Logistic Model 
under Constant Losses 

Discl a imer : This program has been written with care and tested 
under a variety of conditions. The author, however, makes no 
warranty as to its accuracy and functioning, nor shall the fact 
its distribution imply such warranty. 
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C A FORTRAN PROGRAM FOR THE COMPUTATION OF MINIMAX PASSING SCORE 20 

C AND DECISION EFFICIENCY FOR THE TOO -PARAMETER LOGISTIC MODEL 30 

C WITH CONSTANT LOSSES WHICH ARE EQUAL TO ZERO OVER A SELECTED 40 

C INDIFFERENCE ZONE- THE NORMAL APPROXIMATION IS USED TO DESCRIBE 50 

C THE CONDITIONAL DISTRIBUTION OF THE TEST SCORE AT EACH ABILITY 60 

C LEVEL, HENCE THE PROGRAM IS APPROPRIATE WHEN THE NUMBER OF TEST 70 

C ITEMS IS SUFFFICIENTLY LARGE. 80 



C 



90 



C INPUT DATA CARDS ARE: 100 

C FIRST CARD: TITLE OF THE PROBLEM. ENTER ANYTHING YOU WANT. iiO 

C SECOND CARD: ENTER THE FOLLOWING DATA, USING THE FORMAT 120 

C (I10,3F10.5) 130 

t N . . . NUMBER OF ITEMS 140 

C Tl... LOWER LIMIT OF THE INDIFFERENCE ZONE 150 

C T2 . . UPPER LIMIT OF THE INDIFFERENCE ZONE 160 

C Q . . . LOSS RATIO 170 

C THIRD CARD: INPUT FORMAT FOR THE READING OF EACH PAIR 0 180 

C ITEM PARAMETERS. AN EXAMPLE IS (2F10.5). 190 

C FOLLOWING IN THE INPUT DECK ARE N CARDS , EACH CARD 200 

C CONTAINING THE DISCRIMINATION AND DIFFICULTY OF ONE 210 

ITEM, KEYPUNCHED IN THAT ORDER. 220 



C 



230 

C THE PROGRAM IS SET UP FOR TESTS WITH UP TO 200 ITEMS. IF THERE 240 
C ARE MORE THAN 200 ITEMS, SIMPLY CHANCE THE DIMENSIONS OF A AND B 250 

C IN THE FOLLOWING DIMENSION STATEMENT TO A(N) AND B(N). 260 

Q* *********AhhhhhkAhhAkkk * **hAhAh*** *1i*k M 270 

DIMENSION A(200) ,B(200) ,FCT(20) 280 

5 READ(5,95,END-99) (A(I) ,1-1,20) 290 

95 F0RMAT(20A4) 300 

WRITE(6.195) (A(I), 1-1,20) 310 

195 FORMAT ( 1 1 1 , •MINHIAX DECISION ANALYSIS FOR THE TWO -PARAMETER' / 320 

* T2, 'LOGISTIC MODEL. TITLE OF THIS PROBLEM IS:7T2,20A4) 330 
READ(5,100) N,T1,T2,0 340 

100 FORHAT(I10,3F10.5) 350 

TOL-. 00001 360 

READ(5,95) FCT 370 

WRITE(6,200) N,T1,T2,Q,T0L 380 

200 FORMAT (T2, 'NUMBER OF ITEMS ',14// 390 

* T2 , ' INDIFFERENCE ZONE ON THE ABILITY THETA SCALE'/ 400 

* T2,' LOWER LIMIT (THETA-ZERO) . ' ,F10.5/ 410 

* T2,' UPPER LIMIT (THETA-ZERO '/ 420 

* T2,' PLUS EPSIL0N).',F10.5// 430 

* T2,'L0SS RATIO Q '.F10.5/ 440 

* T2, 'TOLERANCE ERROR ',F10.5// 450 

* T2, ' ITEM PARAMETERS ' / 460 

* T2,'ITEH ID DISCR. DIFF. 7) 470 
DO 10 I-l,iI 480 
RE AD (5, FCT) A(I),B(I) 490 
Pl-EXP (A (I) * (Tl-B (I) ) ) 500 
P1-P1/U.+P1) 510 
P2-EXP(A(I)*(T2-B(I))) 520 
P2-Q*(l.-P2/(1.+P2)) 530 
1>P1 540 
IF(P1.LT P2) D-P2 550 
F0R-EXP(A(I)*((Tl+T2)/2-B(I))) 560 
FOR- A (I ) *F0R/ ( (l+FOR) **2 ) 570 

10 WRITE(6,220) I,A(I),fl(I) 580 

220 F0RMAT(T4,I4,F12.3,F12.3) 590 

CALL SC0RE(N,A,B,Tl,T2,TOL,Q,CZER0,ETA) 600 

WRITE(6,230) CZERO , ETA 610 

230 F0RMAT(//T2, 'FINAL RESULTS'// 620 

* T2,'FIFAL MINIMAX PASSING SCORE' , Fl(. 5/ 630 

* T2, 'DECISION EFFICIENCY \FlG.5//> 640 

GOT: 1 650 

99 WRITE(6,245) 660 

245 FORMAT (T? '** NORMAL END OF JOB **'/ 670 

* T2,' PROGRAM WRITTEN BY'/ 680 

* T2, 1 HUYNH HUYNH' / 690 

* T2,' COLLEGE OF EDUCATION'/ 700 

* T2,' UNIVERSITY OF SOUTH CAROLINA'/ 710 

* T2,' COLUMBIA, SOUTH CAROLINA 29208'/ 720 

* T2,' JULY 1980') 730 
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STOP 
END 

SUBROUTINE SCORE ( N, A, B , Tl , T2 , TOL , Q , CZERO , ETA) 
DIMENSION A(1),B(1) 
AA-1./6.28318**.5 
P-l./(Qfl.) 
CALL NORMAL (P , CZERO) 
XM1-0. 
XM2-0. 
SD1-0. 
SD2-0. 
DO 10 I-1,N 
P1-EXP(A(I)*(T1-B(I))) 
P1-P1/U.+P1) 
P2-EXP(A(I)*(T2-B(I))) 
P2-P2/(l.+P2) 
XM1-XM1+A(I)*P1 
XM2-XM2+A(I)*P2 
SD1-SD1+A(I)*P1*(1.-P1) 
10 SD2-SD2+A(I)*P2*(1.-P2) 
SD1-SD1**.5 
SD2-SD2**.5 

WRITE(6,200) XM1,SD2,XM2,SD2 
100 FORMATC/T2 'NORMAL APPROXIMATION FOR TEST SCORES 1 / 

* T2/AT LIMITS OF INDIFFERENCE ZONE'// 

* T2 P 'LOWER LIMIT : MEAN 1 .F10.3/ 

* T2,' S.D \F10.3// 

* T2 'UPPER LIMIT : MEAN \F10.3/ 

* T2/ S.D. ..»..',F10.3/) 

CZERO- (Xia+XM2+(SD1+SD2)*CZER0) /2 - 

WRITE (6, 205) CZERO 
205 FORHAT(T2 , ' STARTING CZERO' 0.5) 
20 Z1-(CZER0-XM1)/SD1 
Z2-(CZERO-XM2)/SD2 

H- . 5*ERFC (- - 7071068*Zl)+q*. 5*ERFC (- . 7071068*Z2) -1 . 
HP-AA*(1./SD1 *EXP(-Zl**2/2)+Q/SDZ *EXP(-Z2**2/2) ) 
D-H/HP 

IF(ABS(D).LT.TOL) GOTO 30 
CZERO-CZERO-D 
WRITE (6 ,210) CZERO 
210 FORMAT (T2. 'UPDATED CZERO \F10.5) 
GOTO 20 

30 RZERO-Q*.5*ERFC(-.7071068*Z2) 
RSTAR-Q/(Q+1. ) 
WRITE (6 ,220) RZERO, RSTAR 
220 FORMAT (T2, 'MINIMAX VALUES ' / 

* T2,' WITH USE OF TEST SCORES \F10.5/ 

* T2,' WITH NO USE OF TEST SCORES . .\F10.5) 

ETA-1 . -RZERO/ RSTAR 

RETURil 

END 

SUBROUTINE :iORMAL(P,X) 
D-P 

IF(D-.5) 9,9,8 

8 D-l.-D 

9 T2-ALOG(l./(D*D)) 
T«SQRT(T2) 

:>T-(2.515517+O.802835*T^.010328'>T2)/(l.O+1.432788*T+n 
U +O.001308*T*T2) 
IF(P-0.5) 10,10,11 

10 X— X 

11 RETURN 
END 



740 
750 
760 
770 
780 
790 
800 
810 
820 
830 
840 
850 
860 
870 
880 
890 
900 
910 
920 
930 
940 
950 
960 
970 
980 
990 
1000 
1010 
1020 
1030 
1040 
1D50 
1060 
1070 
1080 
1090 
1100 
1110 
1120 
1130 
1140 
1150 
1160 
1170 
1180 
1190 
1200 
1210 
1220 
1230 
1240 
1250 
1260 
1270 
1280 
1290 
1300 
1310 
1320 
1330 
1340 

139269*T2 1350 
1360 
1370 
1380 
1390 
1400 
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A VIEW ON THE FUTURE OF MASTERY TESTING 

Anthony J. Nitko 

University of Pittsburgh 

These remarks were made as part of the symposium "First year of 
the Mastery Testing Project. Technical advances, applications, and 
conjectures" at the annual meeting of the American Educational 
Research Association, Boston, April 7-11, 1980. 

As is pointed out in the Overview, the Mastery Testing Project 
has made important strides in solving several psychometric problems 
associated with setting cutting scores on tests for the purpose of 
making mastery decisions. It has been encouraging that the research 
has taken as its central concern making effective and consistant 
decisions. This focus has contributed to the reformulation of test- 
ing issues in the decision context — away from the traditional view 
of the measurement of individual differences and toward a view of 
classification decisions within the context of instruction. 

A second encouraging aspect which contributes to a future view 
of mastery testing is the project's use of the binomial error model 
and the beta-binomial distribution. In the past, most testers have 
applied decision theoretic statistical methods to a normal distri- 
bution model, assuming that both measurement error and ability are 
distributed normally. The Mastery Testing Project has broken with 
this tradition. In a formal and rigorous way, the project has shown 
that other assumptions about the mathematical form of luiman behavior 
can be plausible. Thus, solutions to testing and classification 
problems can be modeled on distributions other than the normal dis- 
tribution. Eventually, this work will help to dispel the en- 
chantment of test users with the nineteenth century view that human 
abilities are "naturally" normally distributed. Unleashed from the 
constraints of a Gaussian view, new vistas of human accomplishments 
are possible in the future. 

The strong true score model adopted by the Mastery Testing 
Project has helped to advance a broader view of what it means to 
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have a "reliable" test. This means that in the future test develrpers 
will be ncre concerned with the consistency of decisions made using 
test scores than they have in the past. Further, wider use of the raw 
agreement and kappa indices are to be expected. In addition, since 
these indices have a broader application than in mastery testing alone, 
and since their statistical form has been rigor sly traced by the 
studies of the Mastery Testing Project, there should be a spillover of 
the cechnical knowledge gained in this project to other aL'eas. 

The Mastery Testing Project has focused on only one view of what 
it means to be a master. The findings of the studies reported here 
will give tremendous creditability to this one view of uastery because 
they have put it on a technically rigorous psychometric foundation. 
In this view of mastery, a "master" is one who can perform correctly 
more of essentially the same kind of task. What is to be learned is 
conceived of essentially as a large domain of test items. The test 
administrator selects a random (or representatively random) sample 
of items from this domain and administers them to the examinee. This 
tester's interest is in estimating either the number or percentage of 
the tasks in the domain to which the examinee can respond correctly. 

This is a useful model for a number of learning objectives, 
especially at an elementary, minimal competence level. Eut the model 
tends to equate mastery with information store and to limit this store 
to verbal information. This view is appropriate, for example, when 
estimating the proportion oZ simple addition facts known, or number 
of three digit, two addend arithmetic problems that can be solved. 

In the future, one can speculate that such a view will not be 
applicable to other important learning problems. Cognitive 
psychologists, for example, have studied the differences between 
"expert" and "novice" performers of complex, problem solving tasks. 
They find that experts differ from novir *,s on qualitative attributes, 
not just on the amount of information stored. Foj example, on in- 
ductive reasoning tasks, Pelligreno and Glaser (1979) found that 
competent performers have (a) better management of memory, (b) better 
knowledge of the constraints in a given problem solving situation, 
and (c) betLer representation of the structure or organization of the 
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knowledge base that is relevant to the problem at hand. 

Teaching and learning directed toward this latter, more cognitive 
view of what it means to have competence or mastery, Is quite different 
than the "domain of tasks" view currently adopted by most educationists. 
In the future, we can expect that the cognitive view will offer In- 
sights Into how to diagnose learning problems and design teaching 
qualitative aspects of competence, not just Its quantitative aspects. 

But these newer cognitive views of mastery are not yet ready to 
be applied. A great deal of research remains to be done before the 
state of knowledge Is at a level where application to test develop- 
ment Is possible. Thus, the lag between these psychological views 
and development of psychometric theory Is to be expected and we cannot 
fault the Mastery Testing Project for not attending to these Issues. 
It Is the nature of the beast, that psychometric theorists have to 
wait until psychological problems are better formulated before 
attempting to apply quantitative methods to their solutions. Perhaps 
at the end of the fourth year of the Mastery Testing Project, It can 
be reported that Huynh and his colleagues have applied their tre- 
mendous talents to the measurement of a new kind of mastery or 
expertise. 
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